Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feedback on Refinement API #821

Open
ehaynes99 opened this issue Apr 9, 2024 · 7 comments
Open

feedback on Refinement API #821

ehaynes99 opened this issue Apr 9, 2024 · 7 comments

Comments

@ehaynes99
Copy link

You asked for feedback here, but this is too much, so staring a new issue.

I definitely like the idea, but I'm not sure that it quite captures what such an API needs. Starting with the example:

const UnsafeByte = Type.Unsafe<number>({ type: 'byte' })

const Byte = Type.Refine(UnsafeByte)
  .Check(value => typeof value === 'number')
  .Check(value => !isNaN(value))
  .Check(value => value >= 0)
  .Check(value => value < 256)
  .Done()

This compiles into a check function like:

return function check(value) {
  return (
    true && // presumably the "check" for `Type.Unsafe`
    refine(0, value) &&
    refine(1, value) &&
    refine(2, value) &&
    refine(3, value)
  )
}

Each of the refinement steps has the type of:

(value: Static<T>) => boolean`

This is fundamentally not correct. This means that the first check has the signature:

(value: number) => typeof value === 'number'

The actual type of value here is unknown. The type system doesn't force users to deal with the initial conversion at all. Consider this simpler example:

const NumericString = Type.Refine(Type.Unsafe<string>({ type: 'string' }))
  .Check((value) => !!value.match(/^\d+(\.\d+)?$/))
  .Done()

// TypeError: value.match is not a function
Value.Check(NumericString, 123)

Now we could solve this by starting with Type.String(), but regardless, I think this misses potential.

In functional programming, the term "refinement" has a common meaning of "function that narrows". Consider the signature:

type Refinement<T, U extends T = T> = (value: T): T is U

Such a function either returns false if T is not a U, or true, which narrows the type. U defaults to T, so we can have a check that doesn't narrow the TS type, but we're able to if we want, and I posit that we HAVE to at least once to transform the unknown to the type we want. However, I would also say that we WANT to, because it would allow narrowing to branded types. This would allow TypeBox to express constraints not possible within the TypeScript type system.

Branded types can be implemented by simply declaring a property of a symbol type not exposed to the outside world, thus ensuring that we've validated the type to end up with an instance.

Personally, I prefer a functional style. Unfortunately, TypeScript doesn't have a recursive definition with "type shifts" in the interim, so we have to use the "accordion hell" pattern. Still, it's in a library and the consumers don't have to care about it, so it's a bit of tedium once and then it's over (I find it easier to define the widest one first, then copy and delete one. It's a lot less error prone than hand-rolling each expansion).

This doesn't handle the error messages, though. We could make each step a type with both the function and the error message, but it could also use your function chaining style with a bit of work. I can spend some more time on putting together a PR, but not until next week (I know I promised you the fastify update a while ago too, but I'm really under the gun to convert a huge part of our system to TypeBox... hopefully there by Friday).

Playground link

export type RefineFn = {
  <A>(a: (value: unknown) => value is A): (value: unknown) => value is A
  <A, B extends A = A>(a: (value: unknown) => value is A, b: (value: A) => value is B): (value: unknown) => value is B
  <A, B extends A = A, C extends B = B>(a: (value: unknown) => value is A, b: (value: A) => value is B, c: (value: B) => value is C): (value: unknown) => value is C
  <A, B extends A = A, C extends B = B, D extends C = C>(a: (value: unknown) => value is A, b: (value: A) => value is B, c: (value: B) => value is C, d: (value: C) => value is D): (value: unknown) => value is D
  <A, B extends A = A, C extends B = B, D extends C = C, E extends D = D>(a: (value: unknown) => value is A, b: (value: A) => value is B, c: (value: B) => value is C, d: (value: C) => value is D, e: (value: D) => value is E): (value: unknown) => value is E
  <A, B extends A = A, C extends B = B, D extends C = C, E extends D = D, F extends E = E>(a: (value: unknown) => value is A, b: (value: A) => value is B, c: (value: B) => value is C, d: (value: C) => value is D, e: (value: D) => value is E, f: (value: E) => value is F): (value: unknown) => value is F
}

export const Refine: RefineFn = (...fns: ((value: unknown) => boolean)[]): any => {
  return (value: unknown) => fns.every(fn => fn(value))
}

declare const Brand: unique symbol

export type BrandedType<B extends string, T> = T & { [Brand]: B }

export type Email = BrandedType<'Email', string>

export const isEmail = Refine(
  // (value: unknown): value is string
  (value): value is string => typeof value === 'string',
  // (value: string): value is Email
  (value): value is Email => !!value.match(/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/)
)

const sendEmail = (email: Email) => {
  console.log('sending email to:', email)
}

const test = (value: unknown) => {
  if (isEmail(value)) {
    sendEmail(value)
  } else {
    console.log('not an email:', value)
  }
}

test('sdfsdfsdf') // not an email: sdfsdfsdf
test('someone@example.com') // sending email to: someone@example.com

// typescript: Argument of type 'string' is not assignable to parameter of type 'Email'.
//   Type 'string' is not assignable to type '{ [Brand]: "Email"; }'. [2345]
sendEmail('sdfsdfsdf')
@sinclairzx81
Copy link
Owner

@ehaynes99 Hi, thanks for the feedback and again apologies for the delay in reply (have had some downtime recently and been trying to catch up on these issues)


Such a function either returns false if T is not a U, or true, which narrows the type. U defaults to T, so we can have a check that doesn't narrow the TS type, but we're able to if we want, and I posit that we HAVE to at least once to transform the unknown to the type we want.

Yeah, I largely agree here that it should be possible to narrow down from a wider type (say unknown) into a constrained type derived from the implementation, but I'm not sure TypeScript provides the necessary facilities for this today, but there is the following PR which may provide some future inroads.

microsoft/TypeScript#57465

I had a quick poke around on TS 5.5 (nightly) and was able to get chained predicate narrowing working "sort of", but the inference isn't quite there yet (and I'm not sure if TS will provide support for more complex narrowing expressions as per the property type checks)

TypeScript Link Here

import { Type, Static, TSchema } from '@sinclair/typebox'

// ----------------------------------------------------------------------------
// TRefined<T>
// ----------------------------------------------------------------------------
export interface TRefined<T> extends TSchema {
  static: T
}
declare class RefineBuilder<T extends TSchema, S = T['static']> {
  constructor(schema: T)
  Check<U extends S>(refine: (value: S) => value is U): RefineBuilder<T, U>
  Done(): TRefined<S>
}
declare function Refine<T extends TSchema>(schema: T): RefineBuilder<T>

// ----------------------------------------------------------------------------
// Usage
// ----------------------------------------------------------------------------
type A = Static<typeof A>
const A = Refine(Type.Unknown()).Check(value => typeof value === 'number').Done()

type B = Static<typeof B>
const B = Refine(Type.Unknown())
  .Check(value => typeof value === 'object')
  .Check(value => value !== null)
  .Check(value => 'x' in value)
  .Check(value => 'y' in value)
  .Check(value => 'z' in value)
  // .Check(value => typeof value.x === 'number') // these don't work :(
  // .Check(value => typeof value.y === 'number')
  // .Check(value => typeof value.z === 'number')
  .Done()
 

Will keep an eye on things rolling through TS side with a focus on being able to express chained predicates (specifically sub property checks + binary expression checks)


This doesn't handle the error messages, though. We could make each step a type with both the function and the error message, but it could also use your function chaining style with a bit of work.

The current implementation of Refine does support error messages in the following way (where you can pass a string that gets generated through failing checks). However, I don't know if I'm entirely happy with this design as I tend to think it's better to be less dependent on textual descriptions of errors and more centered on error codes (this specifically for internationalization of error messages), it's just that I haven't worked out a good approach for this.

const T = Type.Object({                              
  x: Type.Number(),
  y: Type.Number()
})

const S = Type.Refine(T)
  .Check(value => value.x === value.y, 'x must equal y')
  .Done()

const R = Value.Check(S, { x: 0, y: 1 })             // const R = false

const E = [...Value.Errors(S, { x: 0, y: 1 })]       // const E = [{
                                                     //   message: 'x and y must be equal',
                                                     //   value: { x: 0, y: 1 },
                                                     //   ...
                                                     //  }]

Let's keep this discussion open for a bit. Would be good to have the open dialog here and discuss and document some of the rationales behind the Refine type (and how the type itself might become a pivotal type in future iterations of TypeBox). With the combination of Refine + TUnsafe, it would be technically feasible to implement both schematics + assertion logic under refinements (where TypeBox would use Refine internally for all it's standard types). However such a approach would be a fundamental shift in how TypeBox operates. That said, the feature would enable alternative Check operations outside the TypeBox compiler / assertion infrastructure (which may be helpful for users looking to express alternatives schematics to JsonSchema (like Json Type Definition).. Consider.

import { Type, TSchema, RefineKind, Refinement } from '@sinclair/typebox'

const UnsafeByte = Type.Unsafe<number>({ type: 'byte' })

const Byte = Type.Refine(UnsafeByte)
  .Check(value => typeof value === 'number')
  .Check(value => !isNaN(value))
  .Check(value => value >= 0)
  .Check(value => value < 256)
  .Done()

// Check without Value
function Check(schema: TSchema, value: unknown): boolean {
  const refinements: Refinement[] = (schema as any)[RefineKind] ?? []
  return refinements.every(refinement => refinement.check(value))
}

Open to thoughts!
S

@ehaynes99
Copy link
Author

ehaynes99 commented Apr 11, 2024

The current implementation of Refine does support error messages

Oh, right, I just meant that my pipeline-style example did not leave a place to set the error message.

Interesting about that inference PR, but that's a syntactical convenience. With that example, you need to do both narrowing and widening, but it's still possible, just verbose:

// inferred type:
// type B = {
//   x: number
//   y: number
//   z: number
// }
type B = Static<typeof B>
const B = Refine(Type.Unknown())
  .Check((value): value is object => typeof value === 'object')
  .Check((value): value is object => value !== null)
  .Check((value): value is { x: unknown } => 'x' in value)
  .Check((value): value is { x: unknown; y: unknown } => 'y' in value)
  .Check((value): value is { x: unknown; y: unknown; z: unknown } => 'z' in value)
  .Check((value): value is { x: number; y: unknown; z: unknown } => typeof value.x === 'number')
  .Check((value): value is { x: number; y: number; z: unknown } => typeof value.y === 'number')
  .Check((value): value is { x: number; y: number; z: number } => typeof value.z === 'number')
  .Done()

It's worth noting that by the last step, we have explicitly defined the static type of schema, but I think that's actually what we want. We are defining types that are not statically inferrable. The type guards are telling the compiler what the type is, not the other way around.

We can, of course, be more efficient, though. Even with the inference change in TS, I'm not sure which interpretation of typeof value === 'object' it would use, so you might need to specify which one you want anyway:

type B = {
  x: number
  y: number
  z: number
}

const B = Refine(Type.Unknown())
  .Check((value): value is NonNullable<unknown> => value != null)
  .Check((value): value is Record<string, unknown> => typeof value === 'object')
  .Check(
    (value): value is B =>
      typeof value.x === 'number' &&
      typeof value.y === 'number' &&
      typeof value.z === 'number',
  )
  .Done()

@jeremyjacob
Copy link

jeremyjacob commented May 14, 2024

I was surprised to see the way typebox does this with the TypeRegistry. That makes ad-hoc refinements much more verbose and limits the flexibility for custom types. I migrating a project from Zod and have a couple types that take in arguments that are used in refinements and transformations. That doesn't seem to be possible with the current API.

Is there anything we can do to help this effort? Refinements would make typebox the #1 validation library IMO.

Edit:
Just wanted to speak to your question about the error API. Perhaps something like Zod's superRefine would be more flexible where rich and dynamic errors can be added by calling a method that's passed to the validation closure. Dynamic errors are great when we can give the user more information on what went wrong based on the input provided.

@sinclairzx81
Copy link
Owner

@jeremyjacob @ehaynes99 Hi

Sorry for the delay in response on this issue. I have been giving this feature a LOT of thought and unfortunately, I don't think I'm going to go ahead with it at this time. I'll explain my reasoning in a subsequent comment, but for now, I can suggest creating the Refine type by way of the TypeRegistry.

TypeRegistry: Refine + Errors

The following implements Refine via the TypeRegistry as well as sets up custom Error messages.

import { SetErrorFunction, DefaultErrorFunction } from '@sinclair/typebox/errors'
import { Static, Kind, TSchema, SchemaOptions, Type, TypeRegistry } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

// ------------------------------------------------------------------
// Errors
// ------------------------------------------------------------------
SetErrorFunction(param => `errorMessage` in param.schema ? param.schema.errorMessage : DefaultErrorFunction(param))
// ------------------------------------------------------------------
// Refine
// ------------------------------------------------------------------
TypeRegistry.Set<TRefine>('Refine', (schema, value) => schema.func(value))

export interface TRefine<T = unknown> extends TSchema {
  [Kind]: 'Refine'
  static: T,
  func: (value: unknown) => boolean
}
export function Refine<T>(func: (value: T) => boolean, options: SchemaOptions = {}): TRefine<T> {
  return { ...options, [Kind]: 'Refine', func } as never
}

// ------------------------------------------------------------------
// Example
// ------------------------------------------------------------------
type T = Static<typeof T>

const T = Type.Object({
  x: Refine<number>(value => typeof value === 'number'),
  y: Refine<string>(value => typeof value === 'string'),
  z: Refine<boolean>(value => typeof value === 'boolean', { errorMessage: 'Expected boolean' })
})

const V = { x: 1, y: 'hello', z: 'not a boolean' }

console.log(Value.Check(T, V)) // false

console.log(Value.Errors(T, V).First()) // 'Expected boolean'

I was surprised to see the way typebox does this with the TypeRegistry. That makes ad-hoc refinements much more verbose and limits the flexibility for custom types. I migrating a project from Zod and have a couple types that take in arguments that are used in refinements and transformations.

Note that you can pass parameters by embedding them in the schema. The Refine function above embeds the func which is received by the TypeRegistry.


Will post a follow up response as to why I'm going to hold off on this feature shortly
Cheers
S

@sinclairzx81
Copy link
Owner

sinclairzx81 commented May 14, 2024

@ehaynes99

Hesitation to include Refine

So, have given this feature a lot of thought, unfortunately, I'm not keen to pursue it at this time.

My main reservation is that I don't like the idea of adding validation functions to schematics as functions are non-serializable (and thus non-portable). A core premise of TypeBox is that it should only construct schematic / metadata, with that metadata interpreted (or compiled) by something that understands it (i.e. a Json Schema validator). The problem with functions is that you cannot pass them to a remote system and expect that system to apply the same validation rules. I feel the inclusion of Refine breaks this core premise and doesn't address issues as to why devs feel the need to use Refine in the first place (ill talk about this a bit later)

Consider the Byte type mentioned earlier (with and without Refine). Note that while Refine can be used to check the value, using TypeBox's built in types achieves the same result while benefitting from serializable / reflectable schematics.

// You could do this....
const Byte = Type.Refine(UnsafeByte)
  .Check(value => typeof value === 'number')
  .Check(value => !isNaN(value))
  .Check(value => value >= 0)
  .Check(value => value < 256)
  .Done()
  
// .. But why not?
const Byte = Type.Intersect([
  Type.Number(),
  Type.Number({ minimum: 0 })
  Type.Number({ maximum: 256 })
])

// ... Or just?
const Byte = Type.Number({ minimum: 0, maximum: 256 })

Some History

As mentioned, TypeBox was originally built on the premise that schematics "should" encode all the information required to validate a value, and that the schematics should be serializable and reflectable such that a remote system (i.e. C#) could read those schematics and generate the same assertion logic. Again, the problem with Refine is that JavaScript functions do not serialize, and because of this, they cannot be shared with remote systems needing the same validation rules....

With this said, I do appreciate that Refine is very ergonomic and able to reach places the TypeBox type system can't (forcing usage of the TypeRegistry), but feel it would be better to address (or at least explore) underlying inadequacies in the current schematics that require users to fall back to writing non-serializable functions.

JavaScript Schema (Superset of Json Schema)

Rather than adding Refine (which can be implemented by way of the current TypeRegistry). I am instead quite keen to explore additional extended schematics that enable TypeBox to encode more sophisticated validation rules. Unfortunately, I don't have all the answers as to what such a extended schematics should look like (maybe TypeBox should just embed ESTree AST??), but at a minimum, defining a superset of Json Schema that would enable validators to encode arbitrary assertion logic to check any JavaScript value would be the goal.

Perhaps something along the lines of the following (as one possible example)

// We want to validate this Class + method return value
class Add {
  constructor(private a: number, private b: number) {}
  evaluate(): number { return a + b }
}

// Perhaps this could work ??
const T = Type.ConstructorCall([1, 2], Type.Object({ // const T = {
  evaluate: Type.FunctionCall([], Type.Literal(3))   //   constructorCall: {
}))                                                  //     parameters: [1, 2],
                                                     //     returns: {
                                                     //       type: 'object',
                                                     //       required: ['evaluate'],
                                                     //       properties: {
                                                     //         evaluate: {
                                                     //           functionCall: {
                                                     //             parameters: [],
                                                     //             returns: {
                                                     //               const: 3
                                                     //             }
                                                     //           }
                                                     //         }
                                                     //       }
                                                     //     }
                                                     // }

Value.Check(T, Add)                                  // const instance = new value(1, 2)
                                                     // const result = typeof instance['evaluate'] === 'function' &&
                                                     //                instance['evaluate'].apply(instance, [1, 2]) === 3

So, I haven't ruled out Refine as a future type, it's just not something I'm prepared to take on at this time. I think Refine would be very useful to perform auxiliary checks on values where the check logic is dependent on outside things (like checking a value is an instance of a MongoID), it's just that I'd prefer to seek out ways to encode such checks in the schematic metadata rather than turn to functions to perform the check.

With all this said, I think identifying cases where users need to resort to TypeRegistry, then exploring ways of expressing that same checking logic as an encodable schematic ... this will be a good step towards moving TypeBox in the directions it needs to go.

Hope this brings some deeper insights into the decision not to include Refine, will leave this issue open for comment.

Cheers
S

@jeremyjacob
Copy link

Thank you very much the in-depth explanation @sinclairzx81. I'm looking at my code now and one of the places I'm using Refine can be replaced with a better type. The other check could be moved out of the schema.

Ultimately it makes a lot of sense what you're saying. It' a great goal to keep the schema as portable as possible.

Out of curiosity have you thought about Transform in this context? Even if the input/output types are serializable, the function used to transform the values is not.

Refine seems to be valuable when it is not being used to enforce basic type constraints that would be otherwise possible to express in JSON Schema. Until it is revisited, the Refine you provided may be sufficient for those use cases.

@sinclairzx81
Copy link
Owner

sinclairzx81 commented May 15, 2024

@jeremyjacob Hi,

Out of curiosity have you thought about Transform in this context? Even if the input/output types are serializable, the function used to transform the values is not.

Yes, have considered this. There was some apprehension about embedding Transform codec functions on the TB schematics also, however they were included on the basis that codecs do not (or at least should not) change the way values get validated. The schema is still the source of truth; and where the codec can be viewed as additional "application level processing" performed on a value before and after validation.

Rx:  [Input] -> [Validate] -> [Decode]   -> [Output]
Tx:  [Input] -> [Encode]   -> [Validate] -> [Output]
//                 ^
//                 |
//                 these phases are swapped on encode and decode and where 
//                 the validation is treated as a distinct phase from the 
//                 encode/decode phase.

Note that "application level processing" should be interpreted as additional processing a system (be it JS, C#) might do on a value, but is unrelated to validation of that value. Systems may interpret validated values in different ways (i.e. mapping iso8601 strings to System.DateTime or globalThis.Date respectively). Transforms fit into this category but in TypeBox are specific to JS only.

TypeBox interprets Decode/Encode errors as application level errors (separate from validation errors). In the case of Decode, if a value passes validation, the Decode function should ALWAYS succeed (as the value has been checked prior). Similarly, if a Encode function transforms a value, the resulting value should match the expected schematic. In both cases, failure to transform should be interpreted as an application level exception (a bug basically)

In saying this, I have noticed that in the wild, a few users do attempt to use Transforms as a kind of secondary validation (where errors get thrown within the Decode and Encode functions on check failure). While this is of course possible, it's very much against the design intent for the Transform feature. I have actually been considering changing Encode and Decode to remove the implicit Check and having both functions return unknown. This would bring Encode and Decode somewhat inline with Convert, Clean, Default in that the unknown return type encourages careful treatment of the returned value.

const T = Type.Transform(Type.Number())
  .Decode(value => value.toString())
  .Encode(value => parseFloat(value))

// ... possible future change (implicates transforming parse pipelines)

export function Parse<T extends TSchema, R = StaticDecode<T>>(schema: T, value: unknown): R {
  const cleaned   = Value.Clean(T, value)       // cleaned   is unknown
  const converted = Value.Convert(T, cleaned)   // converted is unknown
  const defaulted = Value.Default(T, converted) // defaulted is unknown
  
  // The change would require a user explicitly call Check() the value prior to calling Decode(). The 
  // Decode function may be updated such that the 'value' parameter must match a Static<T>. The return 
  // value would be updated to 'unknown' as it is not possible to statically assert a StaticDecode<T> 
  // due to external runtime processing of the value.
  if(Value.Check(T, defaulted)) {               
    const decoded = Value.Decode(T, defaulted)  // decoded is unknown
    return decoded as R                         // Just trust decoded is R (unsafe)
  } else {
    throw Value.Errors(T, defaulted).First()    // validation error
  }
}

I don't anticipate this change happening for a while. but do feel this change would draw a greater distinction between the Validation and Transform phases (as well as provide a bit more control for framework integrators)

Hope this brings some insights!
S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants