Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize enums with a single value #663

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bmeck
Copy link

@bmeck bmeck commented Nov 11, 2023

Checklist

Before and after log:

bmeck@Bradleys-MBP fast-json-stringify % npm run bench # before

> fast-json-stringify@5.9.1 bench
> node ./benchmark/bench.js # before

single enum string....................................... x 33,928,717 ops/sec ±0.32% (196 runs sampled)
const string............................................. x 1,034,264,612 ops/sec ±0.15% (196 runs sampled)
^C
bmeck@Bradleys-MBP fast-json-stringify % npm run bench # after

> fast-json-stringify@5.9.1 bench
> node ./benchmark/bench.js # after

single enum string....................................... x 1,036,391,465 ops/sec ±0.10% (196 runs sampled)
const string............................................. x 1,031,954,256 ops/sec ±0.24% (195 runs sampled)
^C

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Good spot!

Copy link
Member

@ivan-tymoshenko ivan-tymoshenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a difference between the logic of: enum with one value and enum with multiple values. You need to check that data value is equal to schema one before inserting it.

@ivan-tymoshenko
Copy link
Member

You can read this thread for more details.
#532 (reply in thread)

@bmeck
Copy link
Author

bmeck commented Nov 11, 2023

@ivan-tymoshenko this seems a bit odd since you don't do similar checks for things like const, is there a clear difference here with const? I'm not sure how to sanely validate against a single value at a glance in the code here. lib/schema-validator doesn't seem to have a clear way to check against a deep property. I'm fine with this staying blocked however since I was just driving by and was surprised at a massive sudden drop in speed on me poking with different schemas.

@ivan-tymoshenko
Copy link
Member

The problem is how the serialization function would behave in case when it receives a value that is not in the enum.

const schema = { type: 'string', enum: ['foo', 'bar'] }
const serialize = fjs(schema)
serialize('baz') // should return baz IIRC
const schema = { type: 'string', enum: ['foo'] }
const serialize = fjs(schema)
serialize('baz') // will return 'foo'

IMO this is confusing. I agree that situation with const looks very close. Unfortunatelly this library doesn't have a crear answer to this question. You can read the thread I sent you. It's exactly about this problem.

You still will have some performance boost if you add a sanity check.

@ivan-tymoshenko
Copy link
Member

ivan-tymoshenko commented Nov 11, 2023

My few recomendations on adding enum support:

  • I would add it for strings only
  • if received value is in an enum, it should insert it, if value is not in the enum it should sanitize received string (default bahaviour)

This will give you the performance boost, because you will avoid the string sanitizing in a hot path.

@bmeck
Copy link
Author

bmeck commented Nov 11, 2023

@ivan-tymoshenko I did read that thread but it didn't enlighten me; per https://json-schema.org/draft/2020-12/json-schema-validation#section-6.1.3

6.1.3. const
The value of this keyword MAY be of any type, including null.

Use of this keyword is functionally equivalent to an "enum" (Section 6.1.2) with a single value.

@ivan-tymoshenko
Copy link
Member

Can you tell me what this library should do if it receives a value that is not in the enum?

@bmeck
Copy link
Author

bmeck commented Nov 12, 2023

@ivan-tymoshenko Per the specification, having a value outside of the single value of the enum means that the value passed in isn't valid and is seemingly subject to the README:

While the schema is currently validated for any developer errors, there is no guarantee that supplying user-generated schema could not expose your application to remote attacks.

Users are responsible for sending trusted data. fast-json-stringify guarantees that you will get a valid output only if your input matches the schema or can be coerced to the schema. If your input doesn't match the schema, you will get undefined behavior.

Additionally from that thread linked:

fast-json-stringify guarantees that you will get a valid output only if your input matches the schema or can be coerced to the schema. If your input doesn't match the schema, you will get undefined behavior.

So sending values outside of those explicit values (including null) would be invalid per the schema provided. So the suggestion above of:

if received value is in an enum, it should insert it, if value is not in the enum it should sanitize received string (default bahaviour)

Would actually produce an invalid value for the schema and should not be done.

Since these are values are outside of valid values for the schema it seems likely to be up to implementer desires according to the README and thread linked; I don't think checking the value is needed given these points existing.

I'm not a maintainer so it is hard to say what is desired by maintainers, but that the spec call out of equivalence isn't valid here is surprising. My only comment is that likely const and single value enum should properly align. Personally, I'd rather expect every value outside of the schema to simply be allowed undefined behavior. The README however does list a few cases where things are explicitly validated and thrown and I'm less clear on motivations for those so once again I would struggle to say when things should be checked. Having only enum check while const does not invalidates none of the README guarantees.

Calling out and validating enums is certainly possible to hot path outside of single values with some steps like you suggest above but seems out of my general cause of making this PR which was one of surprise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants