Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching ( and ) in a regexp matcher requires escaping #177

Open
lucacasonato opened this issue Jul 10, 2023 · 4 comments
Open

Matching ( and ) in a regexp matcher requires escaping #177

lucacasonato opened this issue Jul 10, 2023 · 4 comments

Comments

@lucacasonato
Copy link
Member

lucacasonato commented Jul 10, 2023

Fails, even though it's valid:

const pattern = new URLPattern({ pathname: '/([()])' });
Uncaught TypeError: tokenizer error: invalid regex: nested groups must start with ? (at char 1)

OK:

const pattern = new URLPattern({ pathname: '/([\\(\\)])' });

This is because while tokenizing the pattern, we think the second ( is a nested group rather than just a char in a regexp character class.

Fixing this will make the tokenizer more complicated. Is it worth it?

@lucacasonato
Copy link
Member Author

lucacasonato commented Jul 10, 2023

Actually I think we need to fix this for sure. Consider these two patterns:

// valid regexp, but pattern tokenizer thinks there is a nested group that isn't closed
const pattern = new URLPattern({ pathname: '/([(?])' });

// valid regexp group, valid urlpattern, except that this throws because "Invalid regular expression: /^(?:/([))\]\)$/u: Unterminated character class" 
const pattern = new URLPattern({ pathname: '/([)])' });

@wanderview
Copy link
Member

Yea, seems like a real problem to me. I'm not sure when I will have the bandwidth to look at this, though. Do you have a proposed fix?

@lucacasonato
Copy link
Member Author

I think we have to keep track of [ and ] in the tokenizer while parsing regexp tokens, and ignore all ( and ) while between a [ and ] in that regexp. This can be a simple boolean as character classes can't be nested (no need to keep track of depth).

@jeremyroman
Copy link
Collaborator

character classes can't be nested (no need to keep track of depth).

Is this true? Given #178 we'll support syntax like [\d--[07]] (every digit except 0 and 7) as part of the UnicodeSets regexp mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants