Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional groups with a leading slash do not match correctly #149

Open
posva opened this issue Nov 10, 2021 · 15 comments
Open

Optional groups with a leading slash do not match correctly #149

posva opened this issue Nov 10, 2021 · 15 comments

Comments

@posva
Copy link

posva commented Nov 10, 2021

When creating an optional group with ?, it seems like they are not being matched correctly. Here is the test case:

function matchParams(path, pathToTest, groups) {
  const pattern = new URLPattern(path, 'http://e.e')

  const match = pattern.exec(pathToTest, 'http://e.e')
  console.assert(
    (!groups && !match) ||
      JSON.stringify(match?.pathname.groups) === JSON.stringify(groups),
    `Got ${JSON.stringify(match?.pathname.groups)} instead of ${JSON.stringify(groups)}`
  )
}

matchParams('/a/:p?-b', '/a-b', null) // fails because it matches
matchParams('/a/:p?-b', '/a/-b', {}) // fail because it doesn't match
matchParams('/a/:p?-b', '/a/e-b', { p: 'e' }) // matches and pass

All matchParams() call should work but the first 2 ones fail.

@wanderview
Copy link
Member

This is working as intended. For pathname patterns the path-to-regexp prefix option is effectively set to /. This means that a / preceding a group is considered to be a prefix of that group and modifiers apply to it automatically.

This is mainly done so patterns like /foo/:hmm?/bar will match both /foo/baz/bar and /foo/bar. Without the prefix behavior the second would not match because it would want /foo//bar. Repeating modifiers like * and + have a similar issue.

The pattern author can opt out of this behavior by enclosing the group in an explicit grouping using curly braces. So you can write your first pattern as /a/{:p}?-b to get the expected behavior.

Does that make sense?

@posva
Copy link
Author

posva commented Nov 10, 2021

It's possible to handle both scenarios, it's just that path-to-regexp doesn't.

The pattern author can opt out of this behavior by enclosing the group in an explicit grouping using curly braces. So you can write your first pattern as /a/{:p}?-b to get the expected behavior.

It's great there is a way to get this working but it's definitely not intuitive to have this:

  • /a/:p?-b to match /a-b, it should instead match /a/-b

My reasoning is:

  • users expect to be able to put the optional group anywhere and they don't expect /a-:id?-b to work differently from /:id?-b
  • It's not intuitive because the / character is there, as a static character: /a/:p?-b looks more like /a/-b than /a-b and the current behavior, which is IMO not expected, is less common than the proposed one.

@wanderview
Copy link
Member

I think it comes down to which is the more common scenario we optimize for:

  1. An in-fix optional group where we want to avoid doubled-slashes like /foo/:hmm?/bar
  2. Or an optional group where there is trailing content not delimited by a slash like /foo/:hmm?bar

My impression was in pathnames situation (1) is more common and that is why path-to-regexp has that default. This is the first feedback I've heard to contrary.

Unfortunately, I think we would need a pretty big indicator that this is a problem for many devs in order to change the API at this point. It would have been easier to consider if we had this feedback prior to launching.

@posva
Copy link
Author

posva commented Nov 10, 2021

But I'm saying you can have both. You can take it from the other perspective: building URLs from a pattern:

  • Given a pattern of /foo/:id?/bar, pattern.build() -> /foo/bar (Which is the current behavior as I understood)
  • Given a pattern of /foo/:id-bar, pattern.build() -> /foo/-bar (Currently not the case)

The idea is to avoid double / as you said.

We have both in vue router's parser. path-to-regexp limitations should (if possible) not be a limitation for URLPattern.

It doesn't look consistent to have:

matchParams('/a/:p?-b', '/a-b', null) // fails because it matches
matchParams('/a/:p?-b', '/a/e-b', { p: 'e' }) // matches and pass

Unfortunately, I think we would need a pretty big indicator that this is a problem for many devs in order to change the API at this point. It would have been easier to consider if we had this feedback prior to launching.

What do you mean by launching, I thought this was still in the feedback process as it looks like it's just Chrome so far...

@posva
Copy link
Author

posva commented Nov 10, 2021

I also wanted to note I can see how the consistency here is subjective...
Well, at least I'm glad it's possible to use {:id}? to make it work. I think I need to give myself more time of seeing it from the other side.

If {} are always used when groups are inside the /, then it's not that big of a deal I think.

@wanderview
Copy link
Member

wanderview commented Nov 10, 2021

But I'm saying you can have both.

This would be a pretty big processing model change. There is no concept in the spec of looking at future characters to change the behavior of the current grouping. Also, I think it would make a somewhat magical feature and make it even more magical which I think would be bad for user understanding. It would also be a big deviation from path-to-regexp upstream.

If we were to change something I think it would be more likely that we would drop the / automatic prefix option in pathname and you would always have to use { } if you want prefix behavior. So you would have to write /foo{/:hmm}?/bar, etc.

What do you mean by launching, I thought this was still in the feedback process as it looks like it's just Chrome so far...

Once we hit stable chrome we need to be careful about any breaking changes and this would definitely be a breaking change. Usage is still low, so its possible, but I also don't want to thrash the API around either. That's why I said I would need more feedback from many developers, etc, in order to justify making a breaking change.

Edit: Its also implemented in deno so it would require them changing as well, etc.

@posva
Copy link
Author

posva commented Nov 10, 2021

If we were to change something I think it would be more likely that we would drop the / automatic prefix option in pathname and you would always have to use { } if you want prefix behavior. So you would have to write /foo{/:hmm}?/bar, etc.

Yeah, that makes sense. I also didn't know this could change the processing model. I don't know what would be better when it comes to the prefix behavior. I personally never liked it in path-to-regexp because it created unexpected scenarios when the prefix was not / but that doesn't seem to be a problem with URLPattern.

Once we hit stable chrome we need to be careful about any breaking changes and this would definitely be a breaking change. Usage is still low, so its possible, but I also don't want to thrash the API around either. That's why I said I would need more feedback from many developers, etc, in order to justify making a breaking change.

But this is still an experimental API, isn't it? And as such, it would be fine if it changes. It would be very problematic for the web platform if Chrome just pushes a new API and then forces other to adopt it. It doesn't seem to have been accepted yet.

Edit: Its also implemented in deno so it would require them changing as well, etc.

I would say Deno is an exception as they implement new APIs to be more appealing than its competitor, node 😄 They are mainly focused on developers: as a Developer, I can choose if my code runs on Node or Deno while browsers are focused on Users and as a Developer, I cannot choose the browser my user is browsing my code with.

@wanderview
Copy link
Member

But this is still an experimental API, isn't it? And as such, it would be fine if it changes.

Every change is a tradeoff and requires balancing different stakeholder impact. In this case I see the pro/con list of making the change as:

Pro:

  1. One developer indicates the current behavior is unexpected (although they can achieve expected behavior with a small syntax change)

Con:

  1. Potentially breaking sites out from under users and developers.
  2. Potentially violating expectations from large numbers of developers used to working with default behavior in path-to-regexp over a number of years.

In general we prioritize user impact greater than web developer impact. And we prioritize web developer impact over browser engineer impact. So Con(1) is a high priority because it risks breaking sites for users. We can mitigate this by measuring usage to see if its safe to go ahead, but we can't ignore it just because its "experimental". (And usage is low, but non-zero right now.)

But beyond that its not clear to me that Pro(1) outweighs Con(2). The fact that a popular pattern matching library has had this default for a long time seems significant to me. While I understand the behavior is not what you expect, it seems probable that other developers might disagree.

If we hear from many other developers that this behavior is bad and/or confusing then maybe the balance starts to tip the other way. Then we can see if usage is still low enough to risk a breaking change.

So I think maybe we should leave this issue open to see if other folks chime in over time.

@posva
Copy link
Author

posva commented Nov 10, 2021

It's worth saying that MDN docs states the opposite about Experimental APIs : "Expect behavior to change in the future".

It's also worth noting that path to regexp is not the only lib to parse URLs, it's a great source of inspiration though but it's also a bias. For instance: given a pattern of /foo/:id-bar, pattern.build() should yield /foo/-bar not /foo-bar

@lucacasonato
Copy link
Member

lucacasonato commented Dec 14, 2021

It's worth saying that MDN docs states the opposite about Experimental APIs : "Expect behavior to change in the future".

@posva The reasoning here is not that it is impossible to change existing APIs. It is just not easy, and not something we can just do on a whim. Changing APIs in even minor ways can have huge impacts on users browsing the Web, as site will just break under them.

The point here is that the risk of breakage here is greater than the benefit of this change.

It's also worth noting that path to regexp is not the only lib to parse URLs, it's a great source of inspiration though but it's also a bias.

path-to-regexp is the de-facto standard for URL matching in JavaScript. This alone does not make a good API, but it makes for a familiar API.

For instance: given a pattern of /foo/:id-bar, pattern.build() should yield /foo/-bar not /foo-bar

Not per se. The way the pattern matching works today, this should yield /foo-bar. If you want it to output /foo/-bar, the pattern should be /foo/{:id}-bar.

@HansBrende
Copy link

HansBrende commented Jan 29, 2022

@wanderview @lucacasonato @posva so... did I get this right?

Does it match? /foo /foo/ /foo/bar/ /foo//bar Is this pattern syntactically desirable? Is this pattern semantically useful?
/foo/:rest* Yes No
/foo/* Yes Usually not
/foo/*? No Yes

@posva
Copy link
Author

posva commented Jan 29, 2022

To me that depends if the pattern in set to be strict about trailing slashes or not

@HansBrende
Copy link

HansBrende commented Jan 29, 2022

To add my own 2 cents to this issue... it is pretty clear to me that the current behavior of including the leading slash in the optional group only makes sense when there is one single variable (or wildcard) inside the path segment (and in fact, in this one particular case, it makes much more sense than not doing that). But in all other cases, objectively speaking, it makes little to no sense at all.

As @posva mentioned above, you CAN have the best of both worlds by checking for this one condition. I don't see why it would matter (too much) if it causes a processing model change. I thought the main concern here was backwards compatibility. If a portion of the processing model changes, that sucks, but... you can remain backwards compatible in 99.99999% of cases while eliminating the WTFs! So isn't that a win?

@HansBrende
Copy link

HansBrende commented Jan 30, 2022

@wanderview as regards "pretty big processing model change", can you elaborate more on why that is necessarily the case? Are you saying that adding a +1 lookahead to the processing model would be unmanageably difficult? (tryConsume already has this ability! Only difference is it increments the index whereas we'd simply need the ability to "peek" at the next token.)

It seems to me this could be patched into path-to-regexp (for example) with a 2-line code tweak, simply by replacing line 178:

if (prefixes.indexOf(prefix) === -1) { ...

with:

const suffix = ['END', 'CHAR'].includes(tokens[i].type) ? tokens[i].value : null!;

if (prefixes.indexOf(prefix) === -1 || prefixes.indexOf(suffix) === -1) { ...

(Of course, you'd also have to move the tryConsume("MODIFIER") || "" line up to be above the new logic.)

Test results of my teeny tiny tweak:

pathToRegexpTweaked('/a/:p?-b', undefined, {strict: true}).exec('/a-b') // === null ✅
pathToRegexpTweaked('/a/:p?-b', undefined, {strict: true}).exec('/a/-b')![1] // === undefined ✅
pathToRegexpTweaked('/a/:p?-b', undefined, {strict: true}).exec('/a/e-b')![1] // === "e" ✅
pathToRegexpTweaked('/a/:p?/b', undefined, {strict: true}).exec('/a//b') // === null ✅
pathToRegexpTweaked('/a/:p?/b', undefined, {strict: true}).exec('/a/b')![1] // === undefined ✅
pathToRegexpTweaked('/a/:p?/b', undefined, {strict: true}).exec('/a/e/b')![1] // === "e" ✅
pathToRegexpTweaked('/a/:p?', undefined, {strict: true}).exec('/a/') // === null ✅
pathToRegexpTweaked('/a/:p?', undefined, {strict: true}).exec('/a')![1] // === undefined ✅
pathToRegexpTweaked('/a/:p?', undefined, {strict: true}).exec('/a/e')![1] // === "e" ✅

Re: "I think it would make a somewhat magical feature..."

As the above code snippet shows, it would be no more magical than the current implementation (in fact, it would make the existing logic roughly symmetrical.) And it is not so much a new "feature" as it is a reduction of an existing magical feature in cases where it is too magical.

@wanderview
Copy link
Member

Yes, it is possible to change things to make this kind of behavior work. I wasn't trying to say its impossible. But it does seem like a big conceptual change to start altering behavior based on trailing characters after a grouping. That concept doesn't exist in the model today. And introducing it does seem like it would be more complex and magical to me.

So I think there would be a risk of making things worse (more complex, harder to understand, etc) by doing something like this.

On the other hand, authors can still achieve the desired effect by being explicit in their patterns. If they want your behavior for /a/:p?-b, they can write /a/{:p}?-b. This explicitly captures intent without adding more implicit rule complexity. (I think your chart above misses this style of writing an explicit pattern group.)

Add in that it would be a backward incompatible change and the tradeoffs point towards not making this kind of change right now. If there is a huge outpouring of developer sentiment that it needs to change, though, it would be something we could consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants