Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add URLPatternList #166

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

lucacasonato
Copy link
Member

@lucacasonato lucacasonato commented May 3, 2022

Closes #30

Mini-explainer:

This adds a new URLPatternList interface that is can be used to match many URLPatterns at once. This new interface can be used to easily create URL routers with a large number of patterns, without compromising on performance.

An example of how a user could use this API:

const patterns = [
  new URLPattern("https://example.com/"),
  new URLPattern("https://example.com/about"),
  new URLPattern("https://example.com/blog/:slug"),
  new URLPattern("https://example.com/blog/:slug/comments"),
];

const list = new URLPatternList(patterns);

let res;
res = list.exec("https://example.com/about");
patterns.indexOf(res.pattern); // 1

res = list.exec("https://example.com/blog/abc");
patterns.indexOf(res.pattern); // 2
res.pathname.groups; // { slug: "abc" }

list.test("https://example.com/imprint"); // false
list.test("https://example.com/about"); // true
list.test("https://example.com/blog/abc/comments"); // true

The main motivation for this addition is the performance improvements this can unlock compared to a naive matching algorithm based on the existing URLPattern interface. Existing user-land router implementation using URLPattern slow down linearly with the addition of patterns. URLPatternList is designed to be able to deliver sub linear performance characteristics for routers of any size.


TODOs

  • Consensus on sort order
  • Consensus on dynamic add and remove of patterns
  • Check that an optimized implementation is possible (implement in Deno)
  • Write web platform tests

Preview | Diff

This commit adds support for a URLPatternList class.
Copy link
Member

@wanderview wanderview left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this!

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
<xmp class="idl">
[Exposed=(Window,Worker)]
interface URLPatternList {
constructor(sequence<URLPatternListEntry> entries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the feedback in #30 suggested it would be nice to be able to dynamically mutate the list. So add new entries to the list and possibly delete entries from the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I didn't want to add this right away until we figure out the internal sort order a little better.

@wanderview
Copy link
Member

Also:

check that a radix tree implementation is possible (implement in Deno)

I doubt the API shape here will really dictate whether a radix tree implementation is possible or not. Instead I expect the path-to-regexp style pattern matching will impose some constraints which make a pure radix solution difficult.

That does not mean optimizations will not be possible, though. I think there are clear optimizations a list object like this can enable; e.g. log(n) searching of a fixed string prefix in the patterns.

I have also theorized it might be possible to construct an optimization where the patterns are deconstructed into automata similar to regex compilation. Then something akin to a radix tree could be built from the automata. We could then evaluate the automata against the input in O(n) time where n is the input length.

@lucacasonato
Copy link
Member Author

Yeah, that makes total sense. My first pass was going to just deal with speeding up fixed text prefix matches (actually also for regular URLPattern).

I have also theorized it might be possible to construct an optimization where the patterns are deconstructed into automata similar to regex compilation. Then something akin to a radix tree could be built from the automata. We could then evaluate the automata against the input in O(n) time where n is the input length.

I was also thinking about that today! :)

I was actually planning to do exactly this for a v2 of the URLPattern(List) implementation in Deno. The way I envision it is to put another intermediate layer in between the "parts" and the "regexp" representation of a component. This intermediate format would be a layered "matcher" stack that is flat for URLPattern, and a proper tree for URLPatternList. You can then walk through this tree similar to a radix tree (except that some amount of backtracing is likely necessary). Each node in the tree would be some specific match construct. For example the simplest node would just be a literal string match. A more complex "optional" node would have an inner matcher that it would attempt to match before passing off to the next matcher in the stack. There would be a "regexp" node that just delegates a given match to the ECMA regexp engine (for user provided regexps).

What do you think about including this intermediate format in the spec? Doing that would allow us to make sure future spec additions / changes don't break non-regexp matcher algorithms. The spec would still only specify the naive regexp based matcher, but it'd be much easier for implementations to implement alternative faster matchers based on this intermediate format.

@wanderview
Copy link
Member

What do you think about including this intermediate format in the spec?

I don't think we should put any intermediate representation in the spec as it then makes future optimizations more difficult in the implementation. I think we want the flexibility to change the intermediate representation in the future. Also, none of this stuff should be observable to sites.

@wanderview
Copy link
Member

Also, when this is further along it would be nice to include a "mini explainer" in the pull request description. Something like whatwg/dom#1032. And submit to TAG review.

Besides helping to get feedback on the proposal this would also make it easier for chromium to adopt the API more quickly.

@lucacasonato
Copy link
Member Author

@wanderview I simplified the proposed API. There is no more key anymore. Instead, the exec method now returns a pattern field in the result object that users can use to figure out which pattern matched. To do attach a key (or any metadata) to a pattern, users can now use a Map:

const patterns: Array<[URLPattern, string]>;

const keys = new Map(patterns);
const list = new URLPatternList(patterns.map(r => r[0]));

const res = list.exec(...)
if (res !== null) console.log("Matched", map.get(res.pattern));

@wanderview
Copy link
Member

I think @domenic had previously advocated the map approach to me. If this is ergonomic enough for developers, that works for me.

@lucacasonato
Copy link
Member Author

lucacasonato commented May 5, 2022

I have added a mini explainer to the PR description.

@wanderview
Copy link
Member

@lucacasonato What is the status of this? How did your prototyping in deno go? I'm thinking of prototyping this in chromium using re2's set implementation.

@wanderview
Copy link
Member

I was also wondering if we should rename this to URLPatternSet. My impression is we don't want to support duplicate entries.

@wanderview
Copy link
Member

I filed an intent to prototype for chromium:

https://groups.google.com/a/chromium.org/g/blink-dev/c/QrPrveVyFnA/m/PkaPKfJ4AwAJ

@wanderview
Copy link
Member

wanderview commented Oct 20, 2022

@domenic Do you think we should include setlike in the webidl here? (Again, I think we should probably make this URLPatternSet.)

Edit: Note, the URLPatternSet will be ordered, but not insertion order. It will instead be sorted based on a spec defined algorithm.

@wanderview
Copy link
Member

Also, I wonder if the Map oriented approach to associating data with entries will make it hard to have a constructor that takes dictionaries instead of full URLPatterns. Not sure if the dictionaries shortcut is worth optimizing for at the cost of added complexity elsewhere.

@domenic
Copy link
Member

domenic commented Oct 21, 2022

I've always been -1 on the map-oriented approach, myself.

Regarding setlike APIs, I think the main question is whether you want people to introspect and possibly mutate these things after creation. The spec PR here gives an immutable, non-introspectable object, and that seems to accomplish the main use cases. On the other hand, add introspection APIs from setlike (entries, forEach, has, keys, size, values) doesn't seem to hurt. And adding add, clear, and delete also doesn't seem that bad.

Note that if you're not using insertion order, you'll need to override the implementation of add(), in order to put the given value in the right place in the set entries.

@wanderview
Copy link
Member

I've always been -1 on the map-oriented approach, myself.

Really? I thought you suggested it to me before. I must have been confused. Should we use a maplike API instead?

While read-only is all we technically need, web developers have said making it mutable would be nice. Maybe starting read-only would be safest and add mutators later if needed. (For example I'm unsure if putting a duplicate entry in should overwrite or fail.)

@domenic
Copy link
Member

domenic commented Oct 21, 2022

I'm sorry, I misunderstood what you meant. I was -1 on changing the API of URLPatternList to be map-oriented. I am +1 on developers using maps separately as a side table.

@wanderview
Copy link
Member

wanderview commented Oct 21, 2022

Ok, I recall you were positive on passing in an array of init dictionaries to the constructor. But that doesn't seem to work with the external map approach. Does that seem ok to you?

@wanderview
Copy link
Member

To clarify, if you pass in an init dictionary and the constructor makes the URLPattern for you, then you don't have the pattern as a key in your Map.

@domenic
Copy link
Member

domenic commented Oct 21, 2022

Yeah, that seems OK to me. I think people who want to associate data that way can use explicit URLPattern objects.

};

dictionary URLPatternListResult : URLPatternResult {
URLPattern pattern;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how it works internally but shouldn't it contain the match result as well? Since the point is to pick a pattern to match and it's likely doing some work behind the scenes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does — the : URLPatternResult part means this dictionary extends URLPatternResult.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah i didn’t see it. (Not familiar with this spec language 😂)

<div algorithm>
The <dfn constructor for=URLPatternList lt="URLPatternList(patterns)">new URLPatternList(|patterns|)</dfn> constructor steps are:

1. [=list/For each=] |pattern| in |patterns|, run the following steps:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this imply that the pattern list is traversed as an array and is potentially slower? #30 (comment)

Copy link

@bathos bathos Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration steps here and elsewhere don’t appear to be observable, so engines can potentially optimize them however they see fit, right? Spec steps usually aim for the simplest and clearest explanation of observable behaviors, not the most efficient ones (because observable behaviors are the only things they can actually specify, further complexity is noise).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation!

@TomokiMiyauci
Copy link

Does this only do stateless longest matching?

Should it be possible to resume pattern matching from a specified pattern (lastPattern), like RegExp's lastIndex?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

consider exposing URLPatternList
6 participants