Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression Parse Bomb #74

Open
poteat opened this issue Aug 9, 2021 · 5 comments
Open

Expression Parse Bomb #74

poteat opened this issue Aug 9, 2021 · 5 comments

Comments

@poteat
Copy link

poteat commented Aug 9, 2021

The recommended code in the Cookbook for parsing arithmetic expressions introduces an exponential-complexity parse bomb, which I have found does actually become extremely relevant when using Arcsecond for parsing realistic expressions.

Demonstration code (converted to Typescript)

Suffice to say, the example input "9 + ((((((((((5)))))))))) - 4 * 4 / 3" takes twenty seconds on my machine. A contrived example, but again, I have seen unbounded parsing time for reasonable inputs as well.

I have not yet been unable to wrap my head around where the inefficiency is coming from, or how to structure the parser to avoid the bomb. I really enjoy using this flavor of parser, and would be disappointed at having to write a parser manually to avoid this.

If I find out how to fix this, I'll update here so that we might update the Cookbook. Thanks again Francis for putting this library together - I'm super excited about your rewrite in Typescript.

@poteat
Copy link
Author

poteat commented Aug 9, 2021

@IMax153 I'm aware you're the original source of the example Cookbook code - I'll take a look myself in your upstream work, but did you ever find a solution for this?

@poteat
Copy link
Author

poteat commented Aug 9, 2021

Update: It seems this is likely a fundamental issue with parser combinators that is resolved by using packrat parsing - a potential fix is quite straight-forward, inside of the Parser constructor:

this.p = memoize(p, (state) => JSON.stringify(state));

@IMax153
Copy link
Contributor

IMax153 commented Aug 9, 2021

@poteat - if I remember correctly, I did not guard well against infinite recursion, which is likely the problem here.

@poteat
Copy link
Author

poteat commented Aug 9, 2021

@IMax153 Well, that may be true, although I never ran into any infinite recursion issues with your example code. I am confident that your code is very correct actually, for this reason:

I implemented packrat parsing on our downstream fork, and it leads to sizeable performance improvements, and makes parse time linear with respect to input size.

Volley-Inc@73d60d1

We see orders-of-magnitude improvement actually - it would be really cool if a similar fix was introduced in the base package. I'll wait for a maintainer to vet the idea though before I put a PR together.

@francisrstokes
Copy link
Owner

Ahh this is really interesting @poteat! Having just read a bit about packrat parsing, the optimisation seems like a definite win. Took a quick look at memoize-clear - it's only about 40 lines and not doing anything wildly complicated. In the interest of keeping arcsecond dependency free, I think it would be good to have our own implementation.

@poteat poteat mentioned this issue Aug 10, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants