Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named vs. Anonymous wildcard groups have inconsistent and confusing behavior #163

Closed
HansBrende opened this issue Jan 29, 2022 · 12 comments
Closed

Comments

@HansBrende
Copy link

HansBrende commented Jan 29, 2022

As a user of this API, I would expect /foo/* and /foo/:remaining_path* to be synonymous, with the sole difference that I am naming my capturing group. But that is not the case, as shown by the following table:

CURRENT MATCHING BEHAVIOR /foo /foo/ /foo/bar/ /foo//bar
/foo/:rest*
/foo/*

These exactly flip-flopped semantics are likely to go unnoticed and unrealized by a developer, who in all likelihood intended the semantics of /foo/*?.

I would argue that an optional qualifier after a wildcard should be redundant, by virtue of its being a wildcard (and to add to the confusion, it is exactly the same syntax as the non-greedy quantifier *?). In the VAST majority of cases, the wildcard is prefixed by a forward-slash simply to avoid a false-positive match from /foobar, not to mandate a forward-slash per se! And therefore, I'd expect the match table to look like the following, where named and anonymous wildcard groups have consistent semantics:

Proposed Behavior 1 /foo /foo/ /foo/bar/ /foo//bar
/foo/:rest*
/foo/*
Match Result undefined "" "bar/" "/bar"

The above table will match developers' expectations in 99.999% of cases (and, if there is ever a case where a developer does want a trailing slash to match, but simultaneously does not want the base path to match (very odd use-case), they could simply do /foo/(.*) or /foo/*+ or /foo/{*}. If there is ever a case where a developer wants an arbitrary path to match only if it contains no empty path segments (another < 0.001% use-case), they could simply do /foo{/:items}*).

Note: with the above matching table, you'd get the added bonus of staying consistent with react-router's behavior.

OR, if you really want to preserve the necessity of the optional qualifier on top of the wildcard, then I would expect that you'd at least make the behavior consistent between named and anonymous wildcard groups, like so:

Proposed Behavior 2 /foo /foo/ /foo/bar/ /foo//bar
/foo/:rest*
/foo/*
Match Result "" "bar/" "/bar"
/foo/:rest*?
/foo/*?
Match Result undefined "" "bar/" "/bar"

I am asking you to honestly consider: does the current matching behavior presented in the first table make sense for most use cases?

See this comment for a concrete details on how both proposals could be implemented.

See this PR for an actual implementation in path-to-regexp. (Surprisingly simple: only 3 existing lines of code tweaked!)

@wanderview
Copy link
Member

A wildcard * on its own is shorthand for (.*). If you want this behavior with a named group you can write :foo(.*).

The default behavior for a named group without a custom regexp is to match until the end of the segment. This is equivalent to the more complex ([^/]+?) regexp.

Your examples above also include * "zero-or-more" modifiers which uses the same character as the wildcard.

I agree that the situation is not as clear as I would like, but there are competing tradeoffs. In this case backward compatibility with both path-to-regexp (versions with and without wildcards) and wildcard usage in other web APIs.

As to your proposal, it is something we could consider if there is a lot of developer feedback that its needed. But, its a backward incompatible change so we can't make it lightly.

Finally, if you don't like using /*? to get an optional leading slash because its similar to the non-greedy regexp modifier, you can use /** instead.

@domenic
Copy link
Member

domenic commented Jan 31, 2022

I wonder if a clear summary of those rules somewhere would help. E.g. * = (.*), :foo = ([^/]+?), how to use /*?. Some sort of reference table full of examples and all the possible syntaxes.

@HansBrende
Copy link
Author

HansBrende commented Jan 31, 2022

@domenic

I wonder if a clear summary of those rules somewhere would help. E.g. * = (.*)

Your very first example here illustrates exactly the problem with current behavior. I would LOVE it if * simply means (.*). If that were the case, then I wouldn't have an issue to file! My job would be done here. My entire recommendation (well, my 2nd proposal at least) can be boiled down to: make * mean (.*)!

But the fact of the matter is, * is only equivalent to (.*) if it is not "named".

If it is named, then * (starting from the prior leading slash) actually means ([^/]|/(?=[^/]))* (exactly the same as (.*) except that every forward slash must be followed by a non-forward slash character). This is the part that makes no sense and is completely unexpected. In fact, entire libraries have been built around this incorrect assumption:

Screen Shot 2022-01-31 at 2 14 36 PM

@domenic
Copy link
Member

domenic commented Jan 31, 2022

Yes, I specifically meant *, not :foo*.

@wanderview
Copy link
Member

wanderview commented Jan 31, 2022

Yes, its unfortunate that * is both a shortcut for the (.*) group and a modifier. As I tried to express before, though, we felt we needed to do that to conform to previous conventions for both wildcards and modifiers. The alternatives would be to drop * in one of the contexts:

  • Not using * as a modifier would be break with a long standing convention in path-to-regexp and regexps.
  • Not use * for a wildcard group would break longstanding convention for glob-style syntax used in numerous systems. The current published version of path-to-regexp chose to do this, but we saw evidence that people missed the feature (like react router specifically holding at the old version with wildcard support).

Neither of these options is desirable, which is how we ended up with the current situation.

I take @domenic's point, however, that better documentation distinguishing between * as a group and * as a modifier would be helpful to avoid this confusion.

For the case in your screenshot in #163 (comment) you can achieve the desired affect by explicitly writing out the wildcard regexp for the named group; /users/:rest(.*).

@HansBrende
Copy link
Author

@wanderview it is possible to fix the problem without dropping one of the two behaviors. Keep in mind one important observation: the two sets of semantics align perfectly in 99.999% of cases. It is only when a path contains a forward slash not followed by a non-forward slash that compatibility falls apart.

So, if you were to simply say that, in absence of a user-defined regexp, the * modifier always means (.*)... then you'd have a perfectly consistent system.

re: the library in question (wouter): one actually cannot do that in its current state because it only supports a subset of path-to-regexp. (A subset which does not allow user-defined regexps). Which is all great and fine except in this one edge case. But that's not your problem to solve, of course. I was just mentioning it to give a real-world example of the confusion this behavior can cause, in a prominent routing library with tens of thousands of users.

@wanderview
Copy link
Member

So, if you were to simply say that, in absence of a user-defined regexp, the * modifier always means (.*)... then you'd have a perfectly consistent system.

That might make sense if / was the only possible prefix value, but you can have different prefix values. If the prefix is not / then (.*) and :foo* are not equivalent at all. Consider a pattern like /product{/part/:value}*. This would match /product/part/foo/part/bar, but not /product/part/foo/bar. Your proposed change would break that behavior and cause the second one to match.

And again, I'm hesitant to change the meaning of characters based on complex conditional rules. Its already confusing enough that * can be a group or a modifier, I hesitate to make the rules about when its one or the other dependent on trailing characters, what the prefix is, etc.

I hope, though, that it is possible for the behavior you want to be expressed in URLPattern even if its not as convenient as you would like.

@HansBrende
Copy link
Author

@wanderview true, but the open-close brace syntax effectively create a regexp non-capturing group, so I wouldn't consider that to count as "in absence of a user-defined regexp".

@wanderview
Copy link
Member

@wanderview true, but the open-close brace syntax effectively create a regexp non-capturing group, so I wouldn't consider that to count as "in absence of a user-defined regexp".

Its not a regexp. Braces must conform to { <prefix> <group> <suffix> } <modifier> syntax. The prefix and suffix parts can only be plain text. The group part is where a custom regexp would go and it would look the same as if it was outside braces; :foo(blarg).

But I think this suggests a misunderstanding of the processing model which again points to a need for better documentation.

@HansBrende
Copy link
Author

HansBrende commented Jan 31, 2022

@wanderview what I mean is not that the braces are regexps in and of themselves, but rather that you are requesting a non-capturing regexp group to be creat... EDIT: nevermind, just see below comment ⬇

@HansBrende
Copy link
Author

HansBrende commented Jan 31, 2022

@wanderview for the sake of staying on the same page (as I feel like we've been sliding off the same page)... in concrete terms, here is the change I am suggesting. Simply the removal of 1 word on line 189 of your current PR:

Simply replace:

if (!name && !pattern && tryConsume("ASTERISK")) {

with:

if (!pattern && tryConsume("ASTERISK")) {

and you have unified behavior. To check the concerns you brought up:

pathToRegexp('/product{/part/:value}*', undefined, {strict: true}).exec('/product/part/foo/part/bar')![1] 
// prints: foo/part/bar  ✅ 

pathToRegexp('/product{/part/:value}*', undefined, {strict: true}).exec('/product/part/foo/bar')
// prints: null  ✅ 

And finally (real page by the way):

pathToRegexp('/en.wikipedia.org/:rest*', undefined, {strict: true}).exec('/en.wikipedia.org/wiki///')![1]
// prints: wiki///   🎉 

But! Just remember that the above solution is for my 2nd, less-preferred-but-grudgingly-accepting proposal. My first (preferred) proposal would be basically the same as the above but tweaking it as follows:

let existingModifier;
if (!pattern && (existingModifier = tryConsume("ASTERISK"))) { ...

// later, simply replace:
modifier: tryConsumeModifier() || ""
// with:
modifier: tryConsumeModifier() || existingModifier || ""

@wanderview
Copy link
Member

I don't think we're changing this. Thank you for the feedback, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants