Matching `(` and `)` in a regexp matcher requires escaping #177

lucacasonato · 2023-07-10T20:29:35Z

Fails, even though it's valid:

const pattern = new URLPattern({ pathname: '/([()])' });
Uncaught TypeError: tokenizer error: invalid regex: nested groups must start with ? (at char 1)

OK:

const pattern = new URLPattern({ pathname: '/([\\(\\)])' });

This is because while tokenizing the pattern, we think the second ( is a nested group rather than just a char in a regexp character class.

Fixing this will make the tokenizer more complicated. Is it worth it?

The text was updated successfully, but these errors were encountered:

lucacasonato · 2023-07-10T20:34:53Z

Actually I think we need to fix this for sure. Consider these two patterns:

// valid regexp, but pattern tokenizer thinks there is a nested group that isn't closed
const pattern = new URLPattern({ pathname: '/([(?])' });

// valid regexp group, valid urlpattern, except that this throws because "Invalid regular expression: /^(?:/([))\]\)$/u: Unterminated character class" 
const pattern = new URLPattern({ pathname: '/([)])' });

wanderview · 2023-07-10T20:58:34Z

Yea, seems like a real problem to me. I'm not sure when I will have the bandwidth to look at this, though. Do you have a proposed fix?

lucacasonato · 2023-07-10T21:12:24Z

I think we have to keep track of [ and ] in the tokenizer while parsing regexp tokens, and ignore all ( and ) while between a [ and ] in that regexp. This can be a simple boolean as character classes can't be nested (no need to keep track of depth).

jeremyroman · 2023-09-13T14:08:04Z

character classes can't be nested (no need to keep track of depth).

Is this true? Given #178 we'll support syntax like [\d--[07]] (every digit except 0 and 7) as part of the UnicodeSets regexp mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching `(` and `)` in a regexp matcher requires escaping #177

Matching `(` and `)` in a regexp matcher requires escaping #177

lucacasonato commented Jul 10, 2023 •

edited

lucacasonato commented Jul 10, 2023 •

edited

wanderview commented Jul 10, 2023

lucacasonato commented Jul 10, 2023

jeremyroman commented Sep 13, 2023

Matching ( and ) in a regexp matcher requires escaping #177

Matching ( and ) in a regexp matcher requires escaping #177

Comments

lucacasonato commented Jul 10, 2023 • edited

lucacasonato commented Jul 10, 2023 • edited

wanderview commented Jul 10, 2023

lucacasonato commented Jul 10, 2023

jeremyroman commented Sep 13, 2023

Matching `(` and `)` in a regexp matcher requires escaping #177

Matching `(` and `)` in a regexp matcher requires escaping #177

lucacasonato commented Jul 10, 2023 •

edited

lucacasonato commented Jul 10, 2023 •

edited