Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url class performance #575

Open
georgiosd opened this issue Jun 16, 2017 · 8 comments
Open

Url class performance #575

georgiosd opened this issue Jun 16, 2017 · 8 comments

Comments

@georgiosd
Copy link
Contributor

Hey @FlorianRappl

I was profiling some of my code yesterday and found that Url is pretty slow - in 25,000 requests that take around 45sec, 8s was spent in Url. Unfortunately it didn't give me the stack trace of what was calling the constructor (I was only constructing the url myself once).

How come you're not using Uri?

@FlorianRappl
Copy link
Contributor

Uri has a couple of issues. For me the most concerning ones have been that it does not fully adhere to the WHATWG url spec. and that it essentially has no tolerance (i.e., if something is invalid it will just throw - that is not the behavior as seen in the browser).

I think the parser itself could be a little bit faster, but I would first see a straight comparison of Uri to Url (especially with some use cases, e.g., tolerances and relative URLs).

@georgiosd
Copy link
Contributor Author

I see. Couldn't we build on those tolerances with some extensions?

@FlorianRappl
Copy link
Contributor

Unfortunately not, but feel free to look for a way.

@georgiosd
Copy link
Contributor Author

@FlorianRappl if you can describe 1-2 use cases that wouldn't work with Uri, I can sleep on it.

@FlorianRappl
Copy link
Contributor

FlorianRappl commented Jun 26, 2017

Just run the unit tests of Url against Uri.

Bringing some failing examples:

http:\\www.google.com\foo
file:/example.com/
http:@www.example.com
http::b@www.example.com
http://%30%78%63%30%2e%30%32%35%30.01%2e

If you think these URLs are weird - yes of course they are. As explained it is not about the standard URLs, but rather about the goofy ones. They are interpreted correctly in the browser - and in AngleSharp. They lead to errors in Uri though.

@georgiosd
Copy link
Contributor Author

Hm... what if Url used Uri to do the parsing, and if that throws, it fallsback to its own? These cases should be the minority, right?

@FlorianRappl
Copy link
Contributor

FlorianRappl commented Jun 27, 2017

I think this would be way slower. Have you ever compared how long Uri would take? Did you have a look at the parsing of Url? It is actually straight forward and O(N).

Also you would spawn 2 object instances instead of one. So this would increase memory even more.

@FlorianRappl
Copy link
Contributor

Made a very simple benchmark consisting of the following (happy) URLs:

var urls = new []
{
	"http://localhost:8080/?mytest",
	"https://www.google.de/mypath",
	"http://www.google.de",
	"http://example.org/foo/bar",
	"http://example-domain.com/image.jpg",
	"https://github.com/FlorianRappl/AngleSharp",
	"http://www.google.de/some-path?a=b#header",
	"http://test/?hi—there",
	"http://example_domain.com/image.jpg",
	"https://loony_picture.dirty.ru/",
	"http://localhost:12345/account/login",
	"http://www.google.de/some-path#"
};
	
Benchmark(urls, url => new Url(url));
Benchmark(urls, url => new Uri(url));

Uri seems to be between 5 and 10 times faster. I still don't think that Url is slow (or taking the huge amount of time as stated), but maybe we can squeeze out a bit more performance of it.

@FlorianRappl FlorianRappl added this to the v1.0 milestone Jan 4, 2018
@FlorianRappl FlorianRappl modified the milestones: v0.15, v1.0 Apr 22, 2021
@FlorianRappl FlorianRappl modified the milestones: v1.0, vNext Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants