Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect super-linear worst-case runtime #159

Open
RunDevelopment opened this issue Apr 19, 2021 · 0 comments
Open

Detect super-linear worst-case runtime #159

RunDevelopment opened this issue Apr 19, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request new rule

Comments

@RunDevelopment
Copy link
Collaborator

RunDevelopment commented Apr 19, 2021

I think we should add a rule that reports cases for exponential backtracking and polynomial backtracking.

However, there are two problems standing in our way:

Performance

Detecting exponential backtracking is computationally expensive.

Static analysis methods might only take a few milliseconds per regex but this will quickly add up for a large codebase (e.g. Prism has about 2k regexes). Fuzzers will usually take about one second or more per regex and are completely impractical for our purposes.

Solutions

  • Since the detection itself is deterministic, a simple cache might largely solve the performance problem.

    However, I am not aware of ESLint providing such a feature.

Detection

I am not aware of any existing detector implementation in JavaScript (time of writing: 2021-04-19).

safe-regex isn't practical. It implements a detection method that simply doesn't work. Star height has little to do with exponential runtime. I.e. (a|a)*b has a star height of 1 and the (a|a)* will cause exponential backtracking, and (ab+)*c has a star height of 2 without any form of exponential backtracking.

Solutions

  • Right now, we could use my library scslre to detect at least some polynomial and exponential backtracking quickly. scslre even offers fixes for simple cases.

  • There is also this method I implemented for Prism. This method is currently used in PrismJS and Highlight.JS. The upside of this method is that it is quick. It can analyze the >2.5k regexes of Prism in less than 5 seconds. The downside of this method is that there are false negatives and it cannot offer fixes. However, the positives it does find are mostly true positives.

    I should note that half of this method is already partially implemented by the no-dupe-disjunctions rule. If we find two alternatives that are not disjoint and both are children of a star quantifier, then we can reasonably assume that this will cause exponential backtracking (e.g. (a|a)*). Changing the no-dupe-disjunctions to add this information should be fairly easy.

  • Given that refa recently added ENFAs (not published yet), someone could probably implement a detection method like RXXR fairly easily.

  • I am currently writing my Bachelor thesis on a new static analysis method for exponential and polynomial backtracking. Part of this will be a JS library implementing my new detection method. We could also use this when it's ready (probably in 3-4 months or so).


Related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new rule
Projects
None yet
Development

No branches or pull requests

1 participant