-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you know how nwmatcher is so fast? #38
Comments
Does nwmatcher mean nwsapi? I think it's fast because it doesn't generate an AST from the given CSS selector string, but directly applies the regular expression to the string. Another reason for the speed may be that the |
Sorry, yes, I think that nwmatcher was the old name.
When looking at profiles, I don't think the AST building showed up as a cost? Or maybe I don't know which parts of dom-selector are the AST building parts. The hot code was in
I feel like reDOS is a fake vulnerability made up to generate work for security scanning companies and lint plugin writers... for jsdom at least it is not in our threat model. Complaining that someone might be able to make your code slow via a regexp is not a good reason to use something slower than regexp :)
I think the input strings are being generated via nwsapi itself, so it should be fine. Overall, looking at the profiles for nwsapi vs. dom-selector, I am just struck by how much more work dom-selector does and how different the call stacks look. I think it might be educational for you to focus on one case and just look at exactly how nwsapi does it and see if it's possible to adopt that approach to dom-selector. It looks to me like dom-selector is crawling the DOM tree a lot more, maybe? |
And the reason why dom-selector is slow is that:
|
True, but testing each AST against node takes time... |
Oh, wow. There must be a better way to do this, right? Like, by parsing out the selector ahead of time, you should be able to figure out the 5 checks you need to apply. Then, you can loop through each node once, and apply all 5 checks at once. I think this is by far the most promising option.
Why do you need to convert it to an array? Can't you operate directly on the HTMLCollection? Sure, HTMLCollections are not as nice as arrays, but it doesn't seem worth paying the cost. |
I'm converting to array to simplify the iteration in the next step of processing. But:
Maybe it would be better to do above, I'll take the bench and compare. |
Hmm... not a big difference. Above: converting HTML collection benchmark @asamuzakjp/dom-selector v2.0.3-a.1
--
--
--
--
--
--
--
benchmark sizzle-speed @asamuzakjp/dom-selector v2.0.3-a.1
--
|
How about use nwsapi only if selectors are supported and use dom-selector for the rest? // Filter selectors that nwsapi does not support or cannot handle correctly:
// namespaced selector e.g. ns|E, pseudo-element e.g. ::slotted,
// :focus, :indeterminate, :placeholder-shown, :scope,
// :dir(), :lang(), :has(), :is(), :where(),
// :not(complex selector), :not(:pseudo-class), :not(attr)
// :nth-child(an+b of selector), :nth-last-child(an+b of selector),
// :host, :host(), :host-context(),
// [attr i], [attr s]
if (!/\||:(?::|focus|indeterminate|placeholder|scope|dir|lang|has|is|where|not\(\s*(?:\*|[\.#]?[\da-z-_]+(?:[\.#][\da-z-_]+)*)(?:\s*,\s*(?:\*|[\.#]?[\da-z-_]+(?:[\.#][\da-z-_]+)*))*\s*\)|nth-(?:last-)?child\(.+\sof.+\)|host)|\s[is]\s*\]/i.test(selector)) {
// nwsapi
} else {
// dom-selector
}; |
I'm not very excited about that approach; it seems likely fragile where developers might end up switching between the engines non-transparently. I'm hopeful we can have a selector engine in jsdom which uses a fast-by-design architecture. Did you make any progress on the approach I suggested above? Or is it not good?
|
I do not think it's fragile.
It's running in a loop, but it's set to fail fast, so as much optimization as possible has already been done. |
It's possible nwsapi cannot support selectors level 4. But it's clear that it's possible to create fast software that does, since we have 3 browser engines all implementing the spec. So I am not willing to incorporate software with fundamentally slow architecture, when we know that fast is possible. |
Got it. |
I would like to leave a feedback. The current issue that jsdom wants to solve is:
Right? And what you can use now as tools are:
There are currently no tool that can support selector level 4 with high performance (at least, none have been found). I don't care whether dom-selector will be included or not. |
This is not really accurate. jsdom is a volunteer project, with many competing priorities. Given limited maintainer bandwidth, one of the main goals is minimizing maintenance burden, via solid architecture. A big part of this is minimizing dependencies. Having two selector engines instead of one is problematic from this point of view. Having a selector engine with a bad-performance-by-design architecture, in particular, is going to be a significant burden going forward for us. (As we've seen from the time spent churning on jsdom 23.2.0 to 24.0.0.) This is true even if that engine only kicks in sometimes. Actually, that's especially true: it would mean the maintainers need to know exactly what causes the different engines to kick in, e.g. to triage whether a given bug report needs to be reported against selector engine 1 vs. 2, or maybe versus the logic that determines which selector engine to dispatch to. We'd rather maintain tight control over our dependencies, ensuring that they're all designed to work well with jsdom, including from a performance perspective. Maintainability is more important than features. That's why I opened this issue, to encourage you to adopt some of the techniques nwsapi (or browsers themselves) use. I suspect that if you look at exactly how nwsapi accomplishes its speed, e.g. by tracing through all code paths taken for the simpler selectors, you can find ways to incorporate that into dom-selector's architecture. This seems especially likely to me since per the performance flamegraphs discussed above, the parsing is not a bottleneck; it's all about faster DOM retrieval and traversal techniques. |
I don't agree with some things you say, but I understand. As an alternative, I decided to include nwsapi in dom-selector. It's up to you whether you want to add dom-selector again to jsdom. |
nwsapi is not good at handling child and/or descendant selectors such as It is also noticeable in the sizzle benchmark. Fixed above. |
I have a small idea for performance work, which may be good or may be bad. I wanted to share it with you in case it was helpful.
Basically, if you take a benchmark where nwmatcher is fast, have you looked at how it does its work? And how that work compares to dom-selector? Can dom-selector switch to nwmatcher's strategy? Doing so might require ugly special cases or fast paths, but maybe it is worth it.
Feel free to close this if it is not helpful.
The text was updated successfully, but these errors were encountered: