Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to regexp vs string type in pattern #819

Closed
FDiskas opened this issue Apr 1, 2024 · 4 comments
Closed

Migrate to regexp vs string type in pattern #819

FDiskas opened this issue Apr 1, 2024 · 4 comments

Comments

@FDiskas
Copy link

FDiskas commented Apr 1, 2024

As a developer, I would like to use the real type in pattern - RegExp

For me, it is the same as

{
  "foo": "true"
}

instead of

{
  "foo": true
}

Migrate the pattern value to RegExp type

Confusions:
In case it is a string - the developer should add additional escape characters and also can not pass regexp flags

const regex = /^\w(?:[\w-]{0,61}\w)?$/igm;
const regex = new RegExp('^\\w(?:[\\w-]{0,61}\\w)?$', 'igm')

image

But if we change this to RegeXp instead of string - IDE tools and plugins will act accordingly because they will recognize this as regexp
image

@FDiskas FDiskas changed the title Migrate to regexp or string type in pattern Migrate to regexp vs string type in pattern Apr 1, 2024
@ehaynes99
Copy link

ehaynes99 commented Apr 3, 2024

It's not really clear to me what the RegExp type is supposed to be... Being part of the extended types, I would expect it to represent a field of type RegExp, but then if you look at the type check for it, it validates whether an input string matches the pattern, i.e. functionally equivalent to Type.String({ pattern })

import { inspect } from 'node:util'
import { type Static, Type } from '@sinclair/typebox'
import { TypeCompiler } from '@sinclair/typebox/compiler'

const Schema = Type.RegExp(/^[a-z]$/i)

// infers: type Schema = string
type Schema = Static<typeof Schema>

const typeCheck = TypeCompiler.Compile(Schema)

console.log(inspect(Schema))
console.log()
console.log(typeCheck.Code())

outputs:

{
  type: 'RegExp',
  source: '^[a-z]$',
  flags: 'i',
  [Symbol(TypeBox.Kind)]: 'RegExp'
}

const local_0 = /^[a-z]$/i;
return function check(value) {
  return (
    (typeof value === 'string') &&
    local_0.test(value)
  )
}

I would expect the opposite of all of these:

console.log(typeCheck.Check('abCD')) // true
console.log(Value.Check(Schema, 'abCD')) // true

console.log(typeCheck.Check(/^[a-z]+$/i)) // false
console.log(Value.Check(Schema, /^[a-z]+$/i)) // false

@sinclairzx81
Copy link
Owner

sinclairzx81 commented Apr 4, 2024

@FDiskas @ehaynes99 Hi, Apologies for delay in reply (have been recovering from a nasty flu :( )


@FDiskas

As a developer, I would like to use the real type in pattern - RegExp

So unfortunately, I can't really introduce this suggestion as JavaScript ECMA262 regular expressions are far more capable than what is possible to encode in JsonSchema string patterns. Some relevant documentation below with appropriate excerpt.

https://json-schema.org/understanding-json-schema/reference/regular_expressions

The pattern and patternProperties keywords use regular expressions to express constraints. The regular expression syntax used is from JavaScript (ECMA 262, specifically). However, that complete syntax is not widely supported, therefore it is recommended that you stick to the subset of that syntax described below.

So, because of the limitations here, the Type.RegExp was introduced (as a non-spec JavaScript type) that supports the full ECMA262 syntax + Unicode + the /gim modifiers. You should consider using this type for regular expressions that step outside the bounds of what can be appropriately represented in Json Schema. Alternatively, a string format may be a better fit as that allows you to code the format as a string identifier (meaning a reciprocal user of the schema would need to implement the format accordingly)


@ehaynes99

It's not really clear to me what the RegExp type is supposed to be... Being part of the extended types, I would expect it to represent a field of type RegExp, but then if you look at the type check for it, it validates whether an input string matches the pattern, i.e. functionally equivalent to Type.String({ pattern })

You're right the RegExp type is a bit unusual. I'd drafted a little bit of information on the 0.32.0 change log about it here which makes note of the inferred type disparity when compared to the other non-standard types.

Conceptually, the RegExp type is thought of as a extremely wide TemplateLiteral type whose expressions are non-visible to the type system. In this respect, the only possible type to infer would be string. The naming of RegExp (vs RegExpTemplate or similar) is a bit confusing I agree (particularly when contrasted with how other types are named + inferred), however I've gone with the name as it's generally easy to remember and generally signals intent. I am open to other naming schemes however.


@ehaynes99 @FDiskas

Just on this, I am currently considering including a new Type.Refine() to the library. This has been implemented on the refine branch and would permit RegExp implementations like the following....

const T = Type.Refine(Type.String()).Check(value => /^[a-z]*$/i.test(value)).Done()

console.log(Value.Check(T, 'abcd')) // true

This implementation is pending review and feedback, however it's introduction may enable TypeBox to potentially drop ALL extended JavaScript types in favor of refinements, as well as permit additional user defined schematics via a more ergonomic interface (better than TypeRegistry / FormatRegistry)

// user defined byte schema + check logic
const UnsafeByte = Type.Unsafe<number>({ type: 'byte' })

const Byte = Type.Refine(UnsafeByte)
  .Check(value => typeof value === 'number')
  .Check(value => !isNaN(value))
  .Check(value => value >= 0)
  .Check(value => value < 256)
  .Done()

console.log(Value.Check(Byte, 0))     // false
console.log(Value.Check(Byte, 255))   // false
console.log(Value.Check(Byte, -1))    // true
console.log(Value.Check(Byte, NaN))   // true

Open to thoughts and feedback
S

@ehaynes99
Copy link

Had too much to say about that, so created a new issue.

@sinclairzx81
Copy link
Owner

@FDiskas @ehaynes99 Heya

Might close up this issue and track things over at #821. As noted above, the pattern must be string (as that's how Json Schema encodes regular expressions) and the general recommendation will be to use Type.RegExp over String pattern. The refinement API might provide a few more options to express unicode / expression modifier options for string validation, so can track those details over at the other issue.

Cheers!
S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants