New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast parsing and selector query #97
Comments
How are you doing your measurements? AngleSharp is larger and the code inside takes longer to JIT, so using NGEN or some other technique will certainly reduce first-time overhead. Also it could be that the selector query takes longer in AngleSharp v0.8.5, since the CSS parser got (a lot) slower (maybe even a magnitude). It was expected to be slower, but not that severely. So optimizations will take place here. This may also affect the creation of the What CsQuery currently does (and what AngleSharp will do in the future) is to use some sophisticated hashing to make queries even faster. So it could also just be that the case you are looking at (most probably a [very] large page) is just a perfect example where this hashing is beneficial. Hope this helps a bit. |
Thanks for sharing your thoughts. I used System.Diagnostics.Stopwatch (high-resolution timer) to measure the processing time and GC.GetTotalMemory to measure the memory usage. System.Net.WebClient was used to download (not part of the benchmark) the webpages (not the lib's in-built methods) in the test code. AngleSharp (HtmlParser and QuerySelectorAll) took longer and used more memory than CsQuery (with Simple Index) in the following test webpages (mix of small and large): http://www.amazon.com (370Kb) Separate tests were performed using two arbitrary selector queries ('a[href]' and 'div > p > a') for each webpage. I think AngleSharp is promising and it's good to see it being continuously improved. Although CsQuery is no longer being maintained (ref: jamietre/CsQuery#173), but it seems to perform better. I'm looking forward to further improvements to AngleSharp's CSS parser and then run our tests again to evaluate its use in our product. |
So your test is basically a combination of parsing and querying? I think this is an interesting and practical scenario. Will be taken into consideration and used for further improvements. |
Hm still I think the code you are using is not JIT-preprocessed. Also do you use warm-up iterations or multiple runs to exclude / minimize the effect of outliers? I am just asking, since I set up a combination test and I get overall different results. What is definitely true, however, is that the very large sites (especially the single page version of the HTML5 spec) perform better with CsQuery. Right now it seems that for such a large page the parser in CsQuery is performing much better (so the selectors may not even play such a critical role here). I will definitely investigate this and try to come up with an improved version. |
Yes, my test code is a combination of parsing and querying, but I'm measuring/comparing the time taken for both separately (not as a single task). Most of the bigger time differences occur during parsing. Any time differences in querying are within a smaller margin comparatively, often negligible. |
Hi,
I'm trying the following code, but in some cases either the parsing or selector query (or both) are slower than CsQuery:
What configuration or other optimizations would result in the fastest parsing and selector query in AngleSharp? If it matters, I don't want to parse the CSS or JS, just the HTML.
The text was updated successfully, but these errors were encountered: