Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visitor: improve usability #20

Open
conartist6 opened this issue Jan 10, 2021 · 6 comments
Open

Visitor: improve usability #20

conartist6 opened this issue Jan 10, 2021 · 6 comments

Comments

@conartist6
Copy link

conartist6 commented Jan 10, 2021

The current visitor has some problems. The biggest is that there's no way to visit every expression, where an expression might be the content of a group or of a lookahead or lookbehind. All expressions have common needs -- in particular the alternatives need to be evaluated to know if the expression matches, after which different actions are appropriate depending on the specific type. The worst offender is the assertion type, because it may or may not be an expression at all, depending on whether the kind is lookahead or lookbehind. This may be fine for a backtracking engine since these can evaluate the assertion inside their own block scope, but it makes life fabulously difficult for non-backtracking engines which must always track the states until the input reveals whether the subexpression matches or fails.

@conartist6
Copy link
Author

conartist6 commented Jan 10, 2021

OK I was able to get much better results by writing my own visitor. I tried a couple variations, but what ended up being best was going all the way back to the simplest basics where a visitor for a type is responsible for propagating the visit forwards itself. This had huge advantages:

  • I was able to easily abstract out some code to deal with alternatives.
  • I could prevent traversal into structures I consider atomic, i.e. those that try to match against an input character (like character classes). Characters are sometimes atomic (so I must have a visitor for them) but inside character classes they are not. Avoiding traversing into the character class allowed me to avoid a complex conditional to determine whether a character was atomic.
  • I was able to make visit return a value, which is super useful. Now instead of maintaining structure outside the traversal I can use pure functional composition. This greatly simplified the implementation of repetition, and made almost all the code more readable.
  • Very fast. There are no conditionals at all left and no more wrapper function calls. Many fewer lines of code.

I do pay a bit of a cost of course as I have to maintain the code that propagates the traversal forwards, but having the return values ensures it's pretty obvious if something is missing or incorrect.

Here is my traversal code.

@MichaelDeBoey
Copy link

Hi @conartist6!

Since this repo is unmaintained, you might want to re-open this issue in the @eslint-community fork https://github.com/eslint-community/regexpp

For more info about why we created this organization, you can read https://eslint.org/blog/2023/03/announcing-eslint-community-org

@conartist6
Copy link
Author

@MichaelDeBoey Cool, I should probably upgrade my projects to use the one that you're maintaining then!

@MichaelDeBoey
Copy link

@conartist6 It's indeed advised to update your projects to use the community fork instead of using the unmaintained original project

@conartist6
Copy link
Author

@MichaelDeBoey Ah thanks! I actually went the other way and wrote my own regex parser, uh, and also my own AST data structure and my own parser framework

@MichaelDeBoey
Copy link

@conartist6 That's another possibility of course 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants