Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisations to inference in 0.32 seems to cause ts(2589) with composite types. #825

Open
Gnuxie opened this issue Apr 10, 2024 · 16 comments

Comments

@Gnuxie
Copy link

Gnuxie commented Apr 10, 2024

Context: Gnuxie/matrix-protection-suite#26

I hit the limit about here in this file where we make extensive use of composite types to try keep the schema modular. Granted, this could be a misuse of the composite type, and I'm not sure where to go from here. FWIW these Schema were all working as of 0.31.15.

I couldn't see anything specifically from #689 that would be causing this, but I could be missing something.

@sinclairzx81
Copy link
Owner

@Gnuxie Hi,

I couldn't see anything specifically from #689 that would be causing this, but I could be missing something.

There has been some recent updates made to Composite to further replicate type evaluation in TypeScript (that is to say, TypeBox is trying to compute the intersection of all types passed to Composite such that the output schema / type is an evaluated object type and not a allOf representation as would be the case for Intersect). However, the Composite evaluation is an extremely expensive thing to compute, and I think when tagged with ReturnType inference (instantiation expressions as per your implementation), this is likely pushing out the extents of the type system.

The recent updates for Composite were to do with Union evaluation. You can read about some of the context here #789. I believe these updates went out on 0.32.16, so if you can try 0.32.15 and just let me know if you still see the same issue, I may reinvestigate using the matrix-protection types as a basis for testing.


Workaround

Granted, this could be a misuse of the composite type, and I'm not sure where to go from here

So, much of the issues here would be to do with TypeBox doing deep type evaluation on the Composite types. If it's reasonable to do so, you can swap from Composite to Intersect. The Intersect type should have the same assertion characteristics as Composite, but the schematics are a bit more "extreme" as each Intersect type is expressed via JSON Schema allOf keyword.

Here's a working version of the matrix-protection types using Intersect instead of Composite.

TypeScript Link Here

If you just want to test version 0.32.15 and let me know how that helps solve things, that would be great. I don't know if I'm going to be able to revert Composite back to a previous version (as the current implementation is closer the envisioned computed type), but do think there's potential to further optimize the type for the implementation you've provided. There is some pending work to do with respect to deep type evaluation (of which Composite would be involved), but I don't envision optimizations making it out until TypeBox reaches 0.33.x (which is a fair way off)

For now, give Intersect a try (which will keep you on the latest versions) and just let me know if you still run into issues on 0.32.15, I'll do some digging based on your feedback.

Cheers!
S

@Gnuxie
Copy link
Author

Gnuxie commented Apr 11, 2024

Hi, thanks for the speedy response, I'll see what happens in 0.32.15 and report back.

Workaround

For now though, I want to add some context for why I originally moved away from Intersect, and there are two reasons.

The first is it's less helpful dealing with the typescript union type when trying to debug an issue, since obviously they can become so massive. It can be very difficult to see what is wrong based on the output typescript gives when working with what are believed to be (by me) compatible types. And then VSCode would also expand references to types into a massive type literal when using VSCode's feature to add missing methods when implementing classes in interfaces, and I would get annoyed trying to delete them. Which yes, is probably a silly reason 😓.

e.g.

interface Timeline {
    addEvent(event: RoomEvent): void;
}
class StandardTimeline implements Timeline {
    addEvent(event: { ... }
        & { .... }
        & { .... }
    ) {
        throw new TypeError("Unimplemented");
    }
}

However, the Composite evaluation is an extremely expensive thing to compute, and I think when tagged with ReturnType inference (instantiation expressions as per your implementation), this is likely pushing out the extents of the type system.

I see, I don't really understand how the type system works but I trust you. I'll follow up shortly

@Gnuxie
Copy link
Author

Gnuxie commented Apr 11, 2024

The problem seems to be introduced within 0.32.0.

@dwickern
Copy link
Contributor

I'm seeing this issue starting in 0.32.9

@sinclairzx81
Copy link
Owner

@dwickern Heya,

Hmmm, not sure if I'll be able to revert recent Composite functionality, ill have to try and optimize around it. Would it be possible for you share a repro of your type setup? (just wanting to take a look at some of the downstream use cases where Composite is popping out). Any info / reproduction you can provide will be helpful.

@Gnuxie
Copy link
Author

Gnuxie commented Apr 18, 2024

Just an update before anything goes further, i suspect that upgrading typescript can trigger the same problem in 0.31.* but haven't investigated yet.

@sinclairzx81
Copy link
Owner

@Gnuxie Hi,

So, I've dug a bit deeper into this issue using this code as a basis. I do think TypeBox has introduced a deep inference issue on 0.32.0 which I'm going to try and resolve through optimization, however resolving it is difficult as the inference is already quite optimized.

Currently 0.32.0 inference / evaluation of Composite is correct and inline with TS evaluation, it's just that in order to replicate TS evaluation, the work TB needs to do is very top heavy (with this issue specifically seemingly only popping out on deep instantiation expression via ReturnType, although I've been able to push it out in a couple of other ways)

Ill spend some more time today looking for temporary solution, however I don't anticipate a fix will be possible on 0.32.0 as the inference infrastructure may require some refactoring (which would be more indicative of a 0.33.0 minor revision). As mentioned tho, you can swap from Composite to Intersect (which will produce the same inference (non-normalized) and a non-evaluated allOf schema representation). This may serve as a temporary workaround.

Will keep you posted
S

@sinclairzx81
Copy link
Owner

@Gnuxie Hi,

So have investigated this issue further, and unfortunately I don't think I'm going to be able to resolve this on 0.32.x. As it stands, things are already as optimized as I can get them using the current inference strategy, and I think to go further, I'll need to investigate an alternative strategy which will take time (and a revision bump).

As it stands, to solve this issue on the current revision would involve removing inference logic which would make Composite "less correct", it's just that correct-ness in this case has come at a instantiation depth cost....which is unfortunate.


Revisiting Intersect (Workaround)

The first is it's less helpful dealing with the typescript union type when trying to debug an issue, since obviously they can become so massive. It can be very difficult to see what is wrong based on the output typescript gives when working with what are believed to be (by me) compatible types. And then VSCode would also expand references to types into a massive type literal when using VSCode's feature to add missing methods when implementing classes in interfaces, and I would get annoyed trying to delete them. Which yes, is probably a silly reason 😓.

I do think it's reasonable to want presentable types! However I would encourage you to revisit Intersect as a workaround as there are options available to you to leverage Intersect while still obtaining flattened types. TypeBox does export a Evaluate utility type which performs a "static type level" evaluate (producing the same inference as Composite) on intersected and unioned types.

import { Evaluate } from '@sinclair/typebox'

type T = Evaluate<{   // type T = { a: number, b: number }
  a: number
} & {
  b: number
}>

The following is an update to the provided reference issue code using Intersect + Evaluate to flatten out the types.

src/MatrixTypes/Events.ts (implemented with Intersect + Evaluate)

This is probably the best option I can advise at this time until I get a chance to really dig in and restructure some of the internals to support deeper inference. The above would keep you on the newer versions of TB. The only other thing I can think of would be to keep to pre 0.32.x revisions until this issue is resolved on 0.33.x. In terms of timings, I'll be aiming for minor revision sometime around late June (tentative).

Hope this helps in the interim.
S

@Gnuxie
Copy link
Author

Gnuxie commented Apr 20, 2024

@sinclairzx81 Thank you so much for the follow up, that's ok! The Intersect + Evaluate solution looks perfect to me, and while I'm not in a hurry to upgrade to 0.32.x, I will incorporate it when I can.

Gnuxie added a commit to Gnuxie/matrix-protection-suite that referenced this issue Apr 22, 2024
sinclairzx81/typebox#825
Co-authored-by: Haydn Paterson (sinclair) <haydn.developer@gmail.com>
@Gnuxie
Copy link
Author

Gnuxie commented Apr 22, 2024

While not an issue, just an observation for who might be following along, a side effect from moving from Intersect to Composite is that you can get duplicate errors from Value.Errors when the intersect has duplicate schema for some properties.

Gnuxie added a commit to Gnuxie/matrix-protection-suite that referenced this issue Apr 22, 2024
sinclairzx81/typebox#825
Co-authored-by: Haydn Paterson (sinclair) <haydn.developer@gmail.com>
Gnuxie added a commit to Gnuxie/matrix-protection-suite that referenced this issue Apr 22, 2024
sinclairzx81/typebox#825
Co-authored-by: Haydn Paterson (sinclair) <haydn.developer@gmail.com>
@Gnuxie
Copy link
Author

Gnuxie commented May 4, 2024

@sinclairzx81 do you expect there to be any performance implications from using schema with Value.Decodethat have changed from using Composite to Intersect ?

@sinclairzx81
Copy link
Owner

sinclairzx81 commented May 4, 2024

@Gnuxie Hi,

I have just measured the Decode performance for Intersect and unfortunately you can expect a bit of a performance hit there. The following runs 100K Decode operations across Intersect and Composite (where Composite is just an Object). The Intersect performance is slower than I'd expect it to be....test code below.

import { Type } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

const I = Type.Intersect([
  Type.Object({ a: Type.Number(), b: Type.Number(), c: Type.Number() }),
  Type.Object({ d: Type.Number(), e: Type.Number(), f: Type.Number() }),
  Type.Object({ g: Type.Number(), h: Type.Number(), i: Type.Number() })
])
const C = Type.Composite([
  Type.Object({ a: Type.Number(), b: Type.Number(), c: Type.Number() }),
  Type.Object({ d: Type.Number(), e: Type.Number(), f: Type.Number() }),
  Type.Object({ g: Type.Number(), h: Type.Number(), i: Type.Number() })
])

const V = Value.Create(I)

console.time('intersect')
for(let i = 0; i < 100_000; i++) Value.Decode(I, V)
console.timeEnd('intersect')

console.time('composite')
for(let i = 0; i < 100_000; i++) Value.Decode(C, V)
console.timeEnd('composite')


// intersect: 3.425s     - 3 seconds
// composite: 139.097ms  - 0.15 seconds

I do think performance will vary depending on the number of intersected Objects and the number of properties per Object. The main bottleneck in TypeBox appears to be an internal KeyOf operation which reaches into the schematics to pull back a set of known keys from the Intersect.

Internally the operation is performing the following...

type T = (
  ({ a: 1 } & { b: 1 } & { c: 1 }) & 
  ({ d: 1 } & { e: 1 } & { f: 1 }) & 
  ({ g: 1 } & { h: 1 } & { i: 1 })
)

type K = keyof T // "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i"
                 //
                 // ^ this operation is slow

The internal code to perform this operation is as follows.

import { KeyOfPropertyKeys } from '@sinclair/typebox'

const I = Type.Intersect([
  Type.Object({ a: Type.Number(), b: Type.Number(), c: Type.Number() }),
  Type.Object({ d: Type.Number(), e: Type.Number(), f: Type.Number() }),
  Type.Object({ g: Type.Number(), h: Type.Number(), i: Type.Number() })
])

const K = KeyOfPropertyKeys(I) // const K = [
                               //   'a', 'b', 'c',
                               //   'd', 'e', 'f',
                               //   'g', 'h', 'i'
                               // ] // need to optimize this.

I may take a deeper look to see if there's something that can be done to optimize things here. I do note the Decode is using the KeyOf operator from Type.* but where Type.* may perform interior Clone operations to ensure type immutability. There may be potential to implement a fast path key resolver which avoids these clones which might yield faster results.

Just make sure you're on the latest versions of TB as there has been other recent optimizations done to both Decode and Encode. You will want to be on a version 0.32.25+. Recent optimization PR can be found #849.

Let me know how you go.
Cheers
S

@sinclairzx81
Copy link
Owner

@Gnuxie Hiya,

Have pushed an update on 0.32.29 to optimize Intersect Encode and Decode. The bottleneck turned out to be a combination of both KeyOf and Index resolution. The update adds a faster path for these and get around 2x the performance. Test code below.

import { Type } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

const N = Type.Transform(Type.Number())
  .Decode(value => value)
  .Encode(value => value)

const I = Type.Intersect([
  Type.Object({ a: N, b: N, c: N }),
  Type.Object({ d: N, e: N, f: N }),
  Type.Object({ g: N, h: N, i: N })
])

const V = Value.Create(I)
const D = Value.Decode(I, V)
const L = 100_000

console.time('intersect-decode')
for(let i = 0; i < L; i++) Value.Decode(I, V)
console.timeEnd('intersect-decode')

console.time('intersect-encode')
for(let i = 0; i < L; i++) Value.Encode(I, D)
console.timeEnd('intersect-encode')

// before: 0.32.28
//
// intersect-decode: 3.541s
// intersect-encode: 3.483s
//
// after: 0.32.29
//
// intersect-decode: 1.509s
// intersect-encode: 1.369s

Additionally, I've added an optimization to skip running Encode/Decode passes if the schema does not contain a Transform. So avoiding using Transforms with Decode will get you another 2x performance bump. Test code below.

import { Value } from '@sinclair/typebox/value'
import { Type } from '@sinclair/typebox'

const WithoutTransform = Type.Object({
  x: Type.Number(),
  y: Type.Number(),
  z: Type.Number()
})

const WithTransform = Type.Mapped(Type.KeyOf(WithoutTransform), K => 
  Type.Transform(Type.Number())
    .Decode(value => value)
    .Encode(value => value))

const V = Value.Create(WithTransform)
const L = 1_000_000

console.time('WithoutTransform')
for(let i = 0; i < L; i++) Value.Decode(WithoutTransform, V)
console.timeEnd('WithoutTransform')

console.time('WithTransform')
for(let i = 0; i < L; i++) Value.Decode(WithTransform, V)
console.timeEnd('WithTransform')

// WithoutTransform: 316.402ms
// WithTransform: 657.822ms

Just keep in mind, Transforms do have compute overhead, so if you can avoid using them the faster things will go. You can still use Decode without Transforms and get pretty good performance (as above), but for the best possible performance, validation pipelines should rely on Check only (and just reject invalid data)

PR for the above optimizations can be found here.

Cheers

@Gnuxie
Copy link
Author

Gnuxie commented May 5, 2024

Hey @sinclairzx81 thanks for doing a deep dive on this one, you're a real one. For some further context, I updated TypeBox to 0.32.22 within matrix-protection-suite as advised, but I also changed all occurrences of Composite to use Intersect. This commit can be found here. We use a wrapper around Value to call Value.Compile on the schema and cache those (which may be unnecessary?). Regardless, after updating typebox and making the change to Intersect we experienced a major performance regression in Draupnir (the primary dependent of matrix-protection-suite). Basically we use the schema and our Value.Decode wrapper to decode thousands of objects fresh from JSON.parsewhen Draupnir starts. The difference in startup time was ~10seconds to just over a minute and I haven't been able to profile this specifically, so this is loose at best (I am working separatly to get into a position where I can reliably diagnose regressions like this in the future).

For next steps, I will update to 0.32.29 and let you know how that goes, I also acknowledge that you say version below0.32.25 are expected to take a penalty so fingers crossed we should see an improvement.

Just keep in mind, Transforms do have compute overhead, so if you can avoid using them the faster things will go. You can still use Decode without Transforms and get pretty good performance (as above), but for the best possible performance, validation pipelines should rely on Check only (and just reject invalid data)

Yeah this is a good call. We only use transforms to "safely" trick typescript into type asserting custom string types, and on top of that there's no reason why we can't call Value.Errors after Value.Check has failed to get detailed information about the validation failure (we do that anyway if Value.Decode fails). So I will see about refactoring to use Value.Check instead.

Thank you again for all of your work and insight.

@sinclairzx81
Copy link
Owner

sinclairzx81 commented May 5, 2024

@Gnuxie Hi, thanks for the follow up response...


For some further context, I updated TypeBox to 0.32.22 within matrix-protection-suite as advised, but I also changed all occurrences of Composite to use Intersect. This commit can be found here.

In terms of application start up, using Intersect should technically be faster (as Intersect types don't need to be computed), but as noted, Intersect decoding performance will be a bit less due to internal KeyOf / Index operations performed on Decode that are specific to Intersect. I suppose if running Decode ops as a prerequisite to start up, that could be a bottleneck....

Looking at the commit though, everything seems fine I wouldn't initially expect these updates to pop from 10second -> 1minute+. However I've run a quick performance benchmark across Composite and Intersect for Compile, Check, Decode.

The following are the deltas for 100K iterations for each operation

Benchmark Reproduction Code
import { TypeCompiler } from '@sinclair/typebox/compiler'
import { Type, TSchema } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

function Benchmark(name: string, iterations: number, schema: TSchema) {
  const check = TypeCompiler.Compile(schema)
  const value = Value.Create(schema)
  console.log(name)
  console.time(`check`)
  for(let i = 0; i < iterations; i++) Value.Check(schema, value)
  console.timeEnd(`check`)
  console.time(`decode`)
  for(let i = 0; i < iterations; i++) Value.Decode(schema, value)
  console.timeEnd(`decode`)
  console.time(`compile`)
  for(let i = 0; i < iterations; i++) TypeCompiler.Compile(I)
  console.timeEnd(`compile`)
  console.time(`compile-check`)
  for(let i = 0; i < iterations; i++) check.Check(value)
  console.timeEnd(`compile-check`)
  console.time(`compile-decode`)
  for(let i = 0; i < iterations; i++) check.Decode(value)
  console.timeEnd(`compile-decode`)
  console.log()
}

const N = Type.Transform(Type.Number())
  .Decode(value => value)
  .Encode(value => value)

const I = Type.Intersect([
  Type.Object({ a: N, b: N, c: N }),
  Type.Object({ d: N, e: N, f: N }),
  Type.Object({ g: N, h: N, i: N })
])
const C = Type.Composite([
  Type.Object({ a: N, b: N, c: N }),
  Type.Object({ d: N, e: N, f: N }),
  Type.Object({ g: N, h: N, i: N })
])

const iterations = 100_000
Benchmark('intersect', iterations, I)
Benchmark('composite', iterations, C)
// intersect
// check: 55.016ms
// decode: 1.581s
// compile: 1.146s
// compile-check: 3.56ms
// compile-decode: 1.510s   // 15x slower (10 seconds -> 1 minute) ??

// composite
// check: 49.852ms
// decode: 152.714ms
// compile: 1.053s
// compile-check: 2.77ms
// compile-decode: 83.66ms

I do note the big jump from 83 ms (Composite) to 1.5 seconds (Intersect) for compile-decode (15x slower). This would be representative of the KeyOf / Index mentioned earlier. I don't think I'll be able to optimize the Intersect Decode more than it currently is (on this revision), so if you need maximum performance, it may be good to explore ways to remove embedded Transforms (i.e. string types) as running Decode for them may be prohibitively expensive in the case of Intersect.

We use a wrapper around Value to call Value.Compile on the schema and cache those (which may be unnecessary?). Regardless, after updating typebox and making the change to Intersect we experienced a major performance regression in Draupnir (the primary dependent of matrix-protection-suite). Basically we use the schema and our Value.Decode wrapper to decode thousands of objects fresh from JSON.parsewhen Draupnir starts.

I think the wrapper should be fine. Although caching compiled types can run risk cache misses (where the JS reference to the same schema changes (i.e. on clone) which might be a consideration if you don't know how Draupnir is passing types). Looking at the benchmark deltas though, even for 100K iterations (no cache), Compile + Decode should only be a couple of seconds.

Yeah this is a good call. We only use transforms to "safely" trick typescript into type asserting custom string types, and on top of that there's no reason why we can't call Value.Errors after Value.Check has failed to get detailed information about the validation failure (we do that anyway if Value.Decode fails). So I will see about refactoring to use Value.Check instead.

Looking at the implementation for StringUserId it might be better to use the FormatRegistry (Check) + Unsafe (Brand). The following reimplements this type using the FormatRegistry where the branded type is handled by wrapping a String in Unsafe. This should be maximally fast as it avoids Transforms entirely.

import { Type, FormatRegistry } from '@sinclair/typebox'

const UserIDSecret = Symbol('StringUserID')

// ... StringUserID 

FormatRegistry.Set('StringUserId', string => /^@[\S^:]*:\S*$/.test(string))

type StringUserID = string & { [UserIDSecret]: true }

const StringUserID = () => Type.Unsafe<StringUserID>(Type.String({ format: 'StringUserId' }))

// ...

import { Value } from '@sinclair/typebox/value'

Value.Check(StringUserID(), '@1234:5678') // true

Value.Check(StringUserID(), '@12345678') // false

Replacing all String instances with the above would give the fastest possible Decode (as the Transforms would never be run). If using Transforms for similar checks at the type level you can implement a similar strategy using the TypeRegistry. You can find a bit of information on these registries at the link below.

https://github.com/sinclairzx81/typebox#typeregistry

Let me know if this helps
Cheers
S

Gnuxie added a commit to Gnuxie/matrix-protection-suite that referenced this issue May 6, 2024
Gnuxie added a commit to Gnuxie/matrix-protection-suite that referenced this issue May 6, 2024
@Gnuxie
Copy link
Author

Gnuxie commented May 6, 2024

Hi again, I updated to 0.32.29 and followed your advice for changing the various Transforms to use the FormatRegistry and Type.Unsafe instead.

That seems to have gotten us roughly back to the same startup time we had before this saga began 🎉.

Thanks once again for all of your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants