New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: parse yaml with custom nom-based parser written in rust #4807
Draft
paul-soporan
wants to merge
94
commits into
master
Choose a base branch
from
paul/feat/yaml-parser-via-nom
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A few more notes:
|
Update: I changed the parser to work on For a single lockfile parse it's about 30% faster than The bundle size also went down from |
Impressive @paul-soporan 👀 |
paul-soporan
added
infra: pending update
A bot will merge master into this PR
and removed
infra: pending update
A bot will merge master into this PR
labels
Jul 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's the problem this PR addresses?
This PR is part of the effort of solving #1463 and creating a better successor to https://github.com/paul-soporan/enhanced-yaml.
This PR only reimplements the parser. Preserving comments and styling will be a follow-up.
Currently. we parse:
js-yaml
Originally, everything was parsed using the custom peg-based parser, but it was very slow, so parsing via
js-yaml
was introduced in #183.Unfortunately,
js-yaml
is a very complex project that includes some features that we don't intend to use / support in the YAML files we handle, meaning that it would be very complicated to fork to support our needs, which currently are:yarn
invocationjs-yaml
has unmatched parsing speed in the JS ecosystem, no other package I could find even came closeHow did you fix it?
Before implementing the current parser, I did a quick test of existing parsers to see if I could reuse any of them.
This is a quick comparison of 100 lockfile parses, compared to
js-yaml
:Because of this, I decided to take a look at the
nom
Rust crate, a parser combinator framework that makes it very easy to implement parsers in a similar way to Peg.To quote @merceyz,
nom
is a "performant, maintained, and compile-time checked "alternative" to Peg.js".nom
has the advantage of being a zero-cost abstraction implemented in a language full of zero-cost abstractions, which makes it incredibly fast and able to compete even with handwritten parsers.At the moment, the parser implementation mostly matches the PegJS one (but without the legacy lockfie parsing).
After optimizing the compiled WASM binary as much as possible, I've arrived at the following numbers:
Raw benchmarks
See #4807 (comment) for updated numbers.
Single lockfile parse
nom
:37ms
(winner)js-yaml@3 w/ FAILSAFE_SCHEMA
: `39msjs-yaml@4 w/ FAILSAFE_SCHEMA
: `46msnom
manages to beatjs-yaml
in this case even with the WASM boundary overhead.10 lockfile parses
nom
:309ms
js-yaml@3 w/ FAILSAFE_SCHEMA
:245ms
(winner)js-yaml@4 w/ FAILSAFE_SCHEMA
:304ms
Here
nom
matchesjs-yaml@4
(which has apparently regressed quite a lot) and is not much slower thanjs-yaml@3
(only1.26x
1.06x
slower at parsing 10 lockfiles, which isn't even something that happens in practice).Bundle size
Doesn't regress too much:
2.68 MB
->2.72 MB
2.71 MB
. The binary is also mostly glue code and other stuff that can be shared between multiple parsers, which means that once we reimplement the other parsers too, the delta will be even smaller.Should also be noted that I'm optimizing everything for speed (
-O3
inrustc
and-O4
inwasm-opt
).Boot time
WASM files are generally loaded very fast, so it shouldn't affect boot time. Also, once we reimplement all parsers using
nom
, they'll all be loaded together so it should bring the load time down since currently we also need to load the remaining peg parsers too and they don't share any code.Concrete benchmark
Here I'm testing how much it takes for
yarn exec echo foo
to execute. This takes into account:Before:
After:
Result: Unchanged.
Conclusion
As you can see,
nom
is amazing and I intend to reimplement both the existing shell parser and theresolutions
one too if everything goes according to plan.What's left to do
[foo, bar]
){ foo: bar }
)foo # Aside
commentsChecklist