New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add *consistent* wildcard support #270
base: master
Are you sure you want to change the base?
Conversation
…ards to match paths with empty path segments
Let me fix the codebase, it's my bad for merging a dependabot PR (which I also had to disable due to sheer volume of noise) which bumped prettier without the concurrent re-formatting. I'll fix that now and you can rebase the changes. Aside from that, I'm less sure about the actual behavior of |
The other thing to keep in mind with Overall the most unfortunate decision in consistent behavior has been the inclusion of magic prefixes which makes all this behavior harder to follow. If it was explicit the confusion would not exist, e.g. |
Also forgot to mention in my initial review but the summary table you created is really awesome regardless of the actual output of the PR 💯 |
@blakeembrey thanks for the responses and follow-up questions.
No, I would say definitely not. In the absence of a wildcard character, a variable should always have a non-empty value, IMO. The reason it makes sense to me to allow empty path segments when a wildcard is specified is that in this case, the intention is clear: devs want to match for anything. So any valid path should match... regardless of whether or not it contains empty path segments. (This is particularly useful, fyi, in routing libraries. See, e.g., react-router or wouter for examples of how they are using the same syntax. It is quite unexpected behavior to specify a 404 for anything that doesn't match the previous routes, only to have a blank page instead of your 404 page show up because the actual url typed in by the user happened to contain an empty path segment.)
But in this case my PR is consistent with v2 thru v5, right? So wouldn't that be fixing a regression?
True, however... the behavior of the WICG PR will do a similar thing by allowing EDIT: and honestly I'm not sure the subdomain issue is much of a problem in the first place, because even though EDIT Thoughts? |
Isn’t that why you’d want the asterisk or (.*) group to make it clear? I don’t completely disagree, but it does make for some changes in other places. E.g. does path matching that makes arrays today leave in the empty holes or remove them? Does the optional delimiter also allow empty? |
Also something worth keeping in mind is whether we actually care about all these edge cases. In theory if you’re allowing repeated delimiters in one place shouldn’t we just allow it everywhere? But that makes for a more complex regex. And I’ve been of the opinion that the simple solution there is to fix it by normalizing paths at the router level before path matching. Arguably even the trailing slash should be at the router level to normalize it first. |
This could be done, however, there are a couple reasons why this would pose a major burden: (1) Most front-end routing libraries (e.g. wouter, react-router) only expose a subset of the path-to-regexp syntax, and that subset does not include custom regexps at all! 99.999% of the routing architecture use-case can be accomplished without custom regexps, so it's a bit awkward to require one solely for the most common use-case in routing.
This is the part I don't really care about too much since I'm doing my own splitting logic when necessary... and only using the path-to-regexp part of the library (usually splitting is not necessary at all since the remaining path represented by the wildcard is either delegated to a subrouter, or sent to a 404 page). I do believe that the question of "how regexp match results are split" should be an entirely separate & independent question from the standardization of the actual generated regexp. That being said, it might make sense to leave in the holes here. Or take them out, either way. My gut tells me to leave them in, that way the original path can be reconstructed by joining the match with the "/" delimiter, rather than being lossy. Since the dev did not specify an overriding pattern, we're simply making the default pattern be
No. The use-case for the optional quantifier is pretty much 100% across the board either an optional path segment or an optional affix of a single path segment. Therefore, the optional quantifier should not imply a different default pattern than that of a mandatory path segment variable. The use-cases where a dev would want to match a variable or empty string in the former case are basically non-existent, since nobody writes their routes that way (for a very good reason). And if they did, they could simply use 2 patterns: 1 for the non-empty case (e.g.
That's not a robust solution though, because sometimes empty path segments actually mean something. (E.g. https://en.wikipedia.org/wiki///). If you normalize prior to putting the URL through the router, then you are arbitrarily deleting information from the URL that might indeed be used by subrouters. (If only to simply display the correct error message to the user, e.g. "404 path suffix foo////bar was not found in the directory!") This poses a particular problem for generic routing libraries, who cannot make the assumption that the user's routes can be normalized without changing the semantics. |
@blakeembrey P.S. -- if you get a chance and haven't already, please do check out the original issue I filed in URLPattern: whatwg/urlpattern#163. In that issue, I go a lot more in-depth into all the pros & cons that I see here, as well as listing out tables of possible different behaviors. (The behavior I've implemented in this PR is equivalent to "Proposed Behavior 1" in that issue.) |
I realize that there is already a PR to add wildcard support. However, that PR is going to add some pretty big inconsistencies between named and anonymous wildcard groups, in a very confusing way. See: whatwg/urlpattern#163
The response over there so far has been "too bad, so sad, who cares."
So I'm taking the initiative to create this PR to at least demonstrate that there is a way forward with wildcard support that actually makes sense.
Why do I care? Because once URLPattern comes out of experimental status, we will be stuck with the choices we make today, FOREVER. If the
path-to-regexp
library itself chooses the right way forward, then maybe URLPattern will follow suit, since it seems to care a lot about compatibility with this library.How did I go about making the two behaviors compatible with each other? Simply by not disallowing empty path segments in named wildcard groups, and not requiring a trailing slash in anonymous wildcard groups.
Hopefully my efforts here at least give some people food for thought.
P.S. the pre-commit hook that added 600 lines to this PR is really annoying.
P.P.S. Fixes #214. Also fixes or re-fixes the previously closed issues #228, #212, #196, #194 (for the case where it makes sense), #103, #87, and #37.
Detailed breakdown:
/files/:path*\.:ext*
/:everything*
abc/:everything*
#/*
*
/foo/:bar*
/entity/:id/*
/entity/:id/*
/test/*
/some-path/*
(✅ = match; ❌ = no match; 💀 = parse error; all tests run in strict mode)