Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooled version of lexer.Upgrade #289

Closed
wants to merge 3 commits into from

Conversation

klondikedragon
Copy link
Contributor

As observed in #213 (comment), can result in 10% - 30% overall lexing/parsing speedup in cases where needing to lex/parse thousands of documents of a similar size all in a row.

Added a microbenchmark as well showing >4x speedup of Upgrade operation itself if used repeatedly for docs of a similar size.

Can result in 10% - 30% speedup in cases where needing to lex/parse
thousands of documents of a similar size all in a row.
lexer/peek.go Outdated Show resolved Hide resolved
@petee-d
Copy link
Contributor

petee-d commented Dec 10, 2022

Nice. To grant the performance improvement without users having to work with the PeekingLexer directly, you could use it in participle.Parser.parse (the PeekingLexer is only needed during the parse call, PutBackPooledPeekingLexer can be called in a defer). But there is a catch...

We need to make sure the resulting grammar doesn't point to the token slice (otherwise that would be overwritten if the underlying array is reused), which to my (limited) knowledge can only happen in participle.strct.maybeInjectTokens - that method would have to create a copy of the tokens it receives as the arguments. Doing so could make it slower to parse grammars that use Tokens []lexer.Token fields heavily. But as that's more of an edge case, pooling the lexer should still be a net positive for the vast majority of grammars.

@klondikedragon
Copy link
Contributor Author

Perhaps that would be a natural follow-on PR where a participle Option could be added to activate (is disactivate if it's on by default) automatic lexer pooling in participle.Parser.parse? I'm not sure how changing that might intersect with the pending generated parser changes, so I'd prefer to tackle that after those changes are merged eventually. (Does the generated parser create refs to tokens?) In the short-term, I've added a comment to the documentation about considering the timing of calling PutBackPooledPeekingLexer until the caller is finished with the results of the parser, etc. What do you think?

@alecthomas
Copy link
Owner

I agree with @petee-d that this should be a tied to a parser option and used automatically if enabled. Let's wait for the generated code to get merged then merge this, with that additional functionality.

@alecthomas alecthomas closed this Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants