Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent and robust segment wilcard generation #207

Open
rubycon opened this issue Jan 19, 2024 · 0 comments
Open

More consistent and robust segment wilcard generation #207

rubycon opened this issue Jan 19, 2024 · 0 comments

Comments

@rubycon
Copy link
Contributor

rubycon commented Jan 19, 2024

What is the issue with the URL Pattern Standard?

The generate a segment wildcard regexp steps generate a regex that is :

  • Not internally consistent with the full wilcard in handling newline ;
  • Relying on an obscure regex feature: inverted empty character class or empty character class complement.
  • Tricky for some current implementation of RegExp v flag.

The proposed change should make the segment wildcard more consistent with the full wildcard and, in passing, be more forgiving for buggy RegExp implementation.

When processing regex pattern for most part of an URL (expect for host and path), the generate a segment wildcard regexp method will be called with the default options for which the delimiter code point in the empty string.

The generated regex string is then [^]+?, an inverted empty character class with lazy matching. It matches every character, including newline, which is slightly different from the full wildcard (matches every character excluding newline).

But combined with the v flag required by the specs it works differently: the regex try to match a complement class instead of inverting the match. This should be equivalent when dealing with an empty class but it seems some current implementations don't handle this very well. Testing the generated regex /^([^]+?)$/v.test("foobar") with current RegExp implementations:

  • Chrome 122 (v8 12.2.219) => match
  • Deno 1.39.4 (v8 12.0.267.8) => match
  • Node 20.11 (v8 11.3.244.8) => don't match
  • Firefox (122) => don't match

This simple change would avoid dealing with the empty character class regex in the first place and avoid the newline inconsistency.

In generate a segment wildcard regexp

  1. Append "]+?" to the end of result.

by

  1. Append "\n\r]+?" to the end of result.

It ensures the character class is never empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant