Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor (^$) assertion support #11

Open
albert-sun opened this issue Oct 23, 2020 · 1 comment
Open

Anchor (^$) assertion support #11

albert-sun opened this issue Oct 23, 2020 · 1 comment

Comments

@albert-sun
Copy link
Collaborator

I'm using your package as part of research and want to collaborate on adding anchor assertion support. From a group of ~835k regexes, approximately 53% were supported for conversion without anchor support and approximately 93% were supported for conversion when excluding those with anchors. Any thoughts on how I would go about implementing anchor (^$) support?

Cheers and great work on the package.

@albert-sun albert-sun changed the title Anchor assertion support Anchor (^$) assertion support Oct 23, 2020
@RunDevelopment
Copy link
Owner

I'm currently looking into implementing support for lookarounds via RE AST transformations (#9). When finished, refa will have full support for all JS assertions (^$\b\B) as well as limited support for general lookarounds.

The main problem is that there is no NFA that can express the regex /^foo$/, so I'll use a little trick. The idea is that I reserve one special character to represent the "outside" of the string - let's call this character %. With this trick, I can rewrite the regex: /^foo$/ -> /(?<=%)foo(?=%)/.
Since we might end up with lookarounds that go beyond the consumed characters of the string, the transformation will have two option: Either it will make those lookarounds a part of the pattern (/(?<=%)foo(?=%)/ -> /%foo%/) or it will remove them (/(?<=%)foo(?=%)/ -> /foo/).

Since these two transformation steps are independent of each other, you will also be able to just remove all assertions that go beyond. That alone might be enough for your use case.


Also, may I ask what regex set you use?
I currently test refa using Prism's 2.5k regexes but it would be nice to have more to test with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants