Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance #214

Merged
merged 14 commits into from
Aug 3, 2022
Merged

improve performance #214

merged 14 commits into from
Aug 3, 2022

Conversation

Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Jul 13, 2022

Will provide benchmarks later if requested, but it is about 10 % faster (before 2600 k requests/s, after 2800k requests/s)
It would be nice, if you give me feedback regarding the code.

test/vary.test.js Outdated Show resolved Hide resolved
test/vary.test.js Outdated Show resolved Hide resolved
vary.js Outdated Show resolved Hide resolved
Uzlopak and others added 2 commits July 14, 2022 12:03
Co-authored-by: Matteo Collina <matteo.collina@gmail.com>
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 18, 2022

@mcollina
Please take a look

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 18, 2022

I focused on what happens when we parse again and again the vary header. I assume that the vary header is generated deterministic. So I use caching and when having more than one value for vary, it will be still fast. On my machine if I hit it with autocannon it would fall down from 2,6 mio requests to 0,17 mio requests. Now it falls from 2.6 mio requests to about 2 mio requests.

vary.js Outdated Show resolved Hide resolved
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 18, 2022

@mcollina

I am very proud of this now. Its simple, its fast, its sexy. :D

My autocannon-benchmarks show that this is about 2.8 mio requests with no vary header manipulation, 2.6 mio requests with one vary header manipulation, 2.4 mio requests with two vary header manipulations and 2.2 mio requests with three vary header manipulations and so on.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 19, 2022

@mcollina
@jsumners
@Eomm

Does it make sense to create a new package @fastify/vary to use this code also in @fastify/sensible ?

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina
Copy link
Member

why do we need vary in sensible?

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 20, 2022

@mcollina

Sensible also uses vary.

https://github.com/fastify/fastify-sensible/blob/92139a8433b376294f6a3eee51ab38cdb3b3c872/package.json#L39

@mcollina
Copy link
Member

It's just exposed to the end users. The API of what was added in here it's not the same of vary module so it's not a 100% swap.

vary.js Show resolved Hide resolved
add benchmarks
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 20, 2022

I added benchmarks:

vary - field to * x 11,460,799 ops/sec ±0.65% (190 runs sampled)
vary - * to field x 8,176,435 ops/sec ±0.92% (187 runs sampled)
vary - field to empty x 6,908,442 ops/sec ±0.77% (188 runs sampled)
vary - fields string to empty x 8,653,009 ops/sec ±0.60% (189 runs sampled)
vary - field to fields x 3,660,382 ops/sec ±0.69% (189 runs sampled)

cors - field to * x 76,832,035 ops/sec ±1.12% (187 runs sampled)
cors - * to field x 34,669,727 ops/sec ±0.95% (186 runs sampled)
cors - field to empty x 27,483,166 ops/sec ±0.51% (191 runs sampled)
cors - fields string to empty x 33,894,136 ops/sec ±1.82% (179 runs sampled)
cors - field to fields x 334,247 ops/sec ±0.19% (194 runs sampled)

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina requested a review from Eomm July 20, 2022 10:15
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 20, 2022

While running the benchmarks, I realize that the case cors - field to fields is 10 times sllower. I will investigate this further.

Sry :(

Copy link
Member

@climba03003 climba03003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running the benchmarks, I realize that the case cors - field to fields is 10 times sllower. I will investigate this further.

The Request Change here is blocking to land based on the message above.
Let me know when the slow case fix and I will dismiss the Request Change.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 20, 2022

kind of feels strange to use caching that much. I also use mnemonist because it is obviously faster. I would have preferred the regex but well... Anyway.

vary - field to undefined x 7,106,800 ops/sec ±1.04% (188 runs sampled)
vary - field to * x 11,269,425 ops/sec ±1.12% (192 runs sampled)
vary - * to field x 8,844,989 ops/sec ±0.79% (191 runs sampled)
vary - field to empty x 7,321,003 ops/sec ±1.03% (188 runs sampled)
vary - fields string to empty x 8,739,100 ops/sec ±1.11% (191 runs sampled)
vary - field to fields x 4,053,665 ops/sec ±0.46% (192 runs sampled)
cors - field to undefined x 79,317,181 ops/sec ±1.11% (189 runs sampled)
cors - field to * x 65,132,203 ops/sec ±1.17% (187 runs sampled)
cors - * to field x 40,600,000 ops/sec ±1.87% (175 runs sampled)
cors - field to empty x 37,364,825 ops/sec ±0.92% (188 runs sampled)
cors - fields string to empty x 40,388,831 ops/sec ±1.45% (176 runs sampled)
cors - field to fields x 18,774,807 ops/sec ±0.70% (184 runs sampled)

package.json Outdated
@@ -45,7 +45,7 @@
},
"dependencies": {
"fastify-plugin": "^4.0.0",
"vary": "^1.1.2"
"mnemonist": "^0.39.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a really heavy dependency. It also isn't at a stable release number.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 21, 2022

@jsumners

tiny-lru is slower than mnemonist

vary:

vary - field to undefined x 7,098,732 ops/sec ±0.75% (190 runs sampled)
vary - field to * x 11,084,521 ops/sec ±0.93% (191 runs sampled)
vary - * to field x 7,882,434 ops/sec ±0.69% (188 runs sampled)
vary - field to empty x 6,949,372 ops/sec ±0.77% (189 runs sampled)
vary - fields string to empty x 8,385,889 ops/sec ±0.70% (189 runs sampled)
vary - field to fields x 3,548,317 ops/sec ±0.71% (191 runs sampled)

mnemonist:

cors - field to undefined x 108,076,176 ops/sec ±2.34% (167 runs sampled)
cors - field to * x 87,009,887 ops/sec ±1.46% (175 runs sampled)
cors - * to field x 79,040,526 ops/sec ±0.40% (193 runs sampled)
cors - field to empty x 76,961,478 ops/sec ±0.53% (191 runs sampled)
cors - fields string to empty x 77,507,047 ops/sec ±0.46% (191 runs sampled)
cors - field to fields x 62,389,119 ops/sec ±1.21% (176 runs sampled)

tiny-lru:

cors - field to undefined x 106,751,509 ops/sec ±1.82% (171 runs sampled)
cors - field to * x 88,892,481 ops/sec ±1.48% (178 runs sampled)
cors - * to field x 78,700,246 ops/sec ±0.46% (191 runs sampled)
cors - field to empty x 77,239,556 ops/sec ±0.52% (189 runs sampled)
cors - fields string to empty x 76,970,667 ops/sec ±0.61% (191 runs sampled)
cors - field to fields x 27,534,170 ops/sec ±0.55% (191 runs sampled)

It is still about 10 times faster than original vary. So if you want I can use tiny-lru

@mcollina
Copy link
Member

mnemonist is faster than tiny-lru in all possible cases: https://github.com/dominictarr/bench-lru.

@kibertoad
Copy link
Member

I suggest pinning mnemonist at patch version, since it's not at 1.0.0 yet, so I expect breaking changes in minor versions.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Aug 3, 2022

@kibertoad
pinned

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Aug 3, 2022

@climba03003

You want to review again, please?

@mcollina mcollina merged commit f523476 into fastify:master Aug 3, 2022
@kurtextrem
Copy link

kurtextrem commented Aug 9, 2022

  1. out of curiosity, did anyone benchmark other LRU implementations? (if so, I don't wanna repeat the work)
    mainly asking, because e.g. https://github.com/lukeed/flru is a muuuch smaller dep. (it has a benchmark at the bottom, so maybe those are also worth benchmarking). Here's even a native addon, which might also improve perf of the LRU: https://github.com/adzerk/node-lru-native

  2. I'm curious if the new implementation of the parsing is faster than https://github.com/jshttp/vary and if it makes then sense to make a PR there as well so everyone benefits from this speedup?

  3. Apart from this, what is the expected memory increase of using the LRU? This could be very well important for some users.

@Uzlopak Uzlopak deleted the perf branch August 9, 2022 15:25
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Aug 9, 2022

@kurtextrem

  1. I am actually thinking alot about making a general decision regarding a LRU by fastify so that we use everyhere the same LRU
  2. I dont think that we should upstream anything to vary. The caching is something I think would not pass by their maintainers. And tbh I dont expect caching in modules from jshttp-org.
  3. I dont think that it has any increadible memory increase. The Vary header is something very deterministic set by the server. So I strongly assume that in production we actually cache about 20 entries in the lru-cache. Its not like we produces masses of different vary headers on the fly.

@kurtextrem
Copy link

kurtextrem commented Aug 13, 2022

@Uzlopak
Regarding point 1: Do you have any discussion to point me to? In any case, maybe this section is helpful for making decisions for fastify: https://github.com/isaacs/node-lru-cache#performance

Regarding point 2: what I meant is, this PR changes the algorithm of parsing the Vary header in general (excluding the LRU Cache here). The two implementations totally differ now, and I wonder which implementation is faster, since the new implementation benchmarks "new algorithm + LRU" and not "only new algorithm". So what if "old algorithm" (from the vary npm module) is faster with the LRU, and if it isn't, then pushing the new algorithm to the package might make sense.

Regarding point 3: In this case, I think we reduce the 1000 reserved slots by default (https://github.com/fastify/fastify-cors/pull/214/files#diff-6e4256d6cce0fe10bc30208d12a22415f2db8cefe0aeebf6fa2830e590539894R70)

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Aug 13, 2022

The only thing we used vary was to add the necessary key to the vary field was not already set. Also of course the special case of *.

I dont even think that it would actually hurt to have duplicate keys in vary.

So the only chokepoint is to determine if the key is already in the provided string. I think without caching we have already some noticable gains. I think maybe i can squeeze more perf...

We could reduce cacheSize to 100 if you like. 100 like in
https://github.com/fastify/fastify/blob/37ae4ea1f7490e8194e42b1bf85a9db60f048c57/lib/contentTypeParser.js#L40

I am actually kind of unopinionated regarding lru caching. I am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants