Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: enable wasm simd #735

Merged
merged 11 commits into from Apr 27, 2021
Merged

perf: enable wasm simd #735

merged 11 commits into from Apr 27, 2021

Conversation

ronag
Copy link
Member

@ronag ronag commented Apr 12, 2021

wasm simd is still experimental but looking at the performance could be interesting already.

@ronag ronag requested a review from dnlup April 12, 2021 23:37
@ronag
Copy link
Member Author

ronag commented Apr 12, 2021

@dnlup Would you be interested in trying to benchmark a wasm simd build?

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - hopefully this does not cause problems.

@ronag
Copy link
Member Author

ronag commented Apr 13, 2021

@mcollina I don't think we can merge this since it still requires a command line argument to enable experimental simd. Maybe Node 16? I'm just very interested in whether or not there are any performance gains.

@mcollina
Copy link
Member

Ouch!

@dnlup
Copy link
Contributor

dnlup commented Apr 13, 2021

I ran some benchmarks locally, but I couldn't see any noticeable differences atm. Not sure why benchmarks are failing, I did activate the command line option.

@dnlup
Copy link
Contributor

dnlup commented Apr 13, 2021

@ronag
Copy link
Member Author

ronag commented Apr 13, 2021

What node version are we using for benchmark CI?

@dnlup
Copy link
Contributor

dnlup commented Apr 13, 2021

What node version are we using for benchmark CI?

Good catch, 12

https://github.com/nodejs/undici/blob/main/.github/workflows/bench.yml#L18

@ronag
Copy link
Member Author

ronag commented Apr 21, 2021

@dnlup would you mind running this with the latest benchark suite on master?

@dnlup
Copy link
Contributor

dnlup commented Apr 21, 2021

The prebench:run script fails on Node 16, also on main.Not sure why atm.

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

@dnlup any update on perf numbers?

@dnlup
Copy link
Contributor

dnlup commented Apr 24, 2021

Sorry, I was stuck trying to understand what is wrong with the CI. Running locally:

Node

v16.0.0

main

[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  6.97 req/sec │  ± 0.68 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  7.14 req/sec │  ± 0.72 % │                + 2.47 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      10 │ 56.27 req/sec │  ± 1.86 % │              + 707.64 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 69.13 req/sec │  ± 1.63 % │              + 892.26 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 69.18 req/sec │  ± 1.65 % │              + 892.99 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 69.20 req/sec │  ± 2.04 % │              + 893.27 % │
[bench:run] 
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      20 │  3502.83 req/sec │  ± 2.85 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      30 │  7250.33 req/sec │  ± 2.92 % │              + 106.98 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │     101 │  8809.81 req/sec │  ± 3.26 % │              + 151.51 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 13235.04 req/sec │  ± 2.94 % │              + 277.84 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │ 16281.19 req/sec │  ± 2.97 % │              + 364.80 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      30 │ 18726.61 req/sec │  ± 2.75 % │              + 434.61 % │

simd

[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  6.48 req/sec │  ± 0.55 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  6.61 req/sec │  ± 0.94 % │                + 2.07 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      20 │ 58.33 req/sec │  ± 2.50 % │              + 800.85 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 64.97 req/sec │  ± 2.15 % │              + 903.39 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 65.28 req/sec │  ± 0.97 % │              + 908.18 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 67.63 req/sec │  ± 0.79 % │              + 944.41 % │
[bench:run] 
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      35 │  3166.61 req/sec │  ± 2.84 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  6195.06 req/sec │  ± 2.54 % │               + 95.64 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │     101 │  9301.57 req/sec │  ± 3.00 % │              + 193.74 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      25 │ 15199.33 req/sec │  ± 2.99 % │              + 379.99 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 17911.83 req/sec │  ± 2.69 % │              + 465.65 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      30 │ 21642.75 req/sec │  ± 2.70 % │              + 583.47 % │

I would say that there is a noticeable difference.

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

Is SIMD still experimental on Node 16?

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

@dnlup Can we ship a separate simd assembly and do a runtime check if it's available and load accordingly?

@dnlup
Copy link
Contributor

dnlup commented Apr 24, 2021

Is SIMD still experimental on Node 16?

I haven't checked the docs, but removing the flag still crashes the script, so I think so.

@dnlup
Copy link
Contributor

dnlup commented Apr 24, 2021

@dnlup Can we ship a separate simd assembly and do a runtime check if it's available and load accordingly?

We can modify the build script, yes. By available, is it ok to assume that Node has been launched with the cli option?

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

Or alternatively. Always try to use the simd version first and if compilation fails then fallback to non simd?

@ronag ronag added this to the 4.0 milestone Apr 24, 2021
@dnlup
Copy link
Contributor

dnlup commented Apr 24, 2021

Or alternatively. Always try to use the simd version first and if compilation fails then fallback to non simd?

That sounds better

@dnlup
Copy link
Contributor

dnlup commented Apr 24, 2021

@dnlup maybe use https://github.com/GoogleChromeLabs/wasm-feature-detect?

I like this one too. Are we ok with introducing it as a runtime dep?

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

@dnlup maybe use https://github.com/GoogleChromeLabs/wasm-feature-detect?

I like this one too. Are we ok with introducing it as a runtime dep?

I'm ok with it. But the feature detect is so trivial we might as well consider inlining it with a ref comment.

@mcollina
Copy link
Member

I prefer not to add additional runtime dependencies

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

nodejs/node#38273

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

You can also update the README with updated bench info.

@ronag
Copy link
Member Author

ronag commented Apr 25, 2021

@mcollina PTAL. Also could you give us a bench run on this PR?

@ronag
Copy link
Member Author

ronag commented Apr 25, 2021

@dnlup is this ready for review?

@dnlup
Copy link
Contributor

dnlup commented Apr 26, 2021

@dnlup is this ready for review?

I have a few doubts:

  • should we test and bench simd separately?
  • should we report the simd bench report separately in the README?
  • are we ok with adding another wasm build, which increases the final bundle size?

Other than that, we are almost ready for review.

As a note, benchmarks on the CI don't show any differences, unlike the ones I have run locally.

@ronag
Copy link
Member Author

ronag commented Apr 26, 2021

@dnlup is this ready for review?

I have a few doubts:

  • should we test and bench simd separately?

I think we can remove the simd versions of test and bench. I believe simd will be enable by default for Node 16.1 (or soonish). That way we will get it for free in CI.

  • should we report the simd bench report separately in the README?

No, I think we only report the simd bench with a note. This is the future.

  • are we ok with adding another wasm build, which increases the final bundle size?

I don't think it's a problem.

Other than that, we are almost ready for review.

As a note, benchmarks on the CI don't show any differences, unlike the ones I have run locally.

Maybe if @mcollina runs some benchmarks we can get a more accurate result?

@mcollina
Copy link
Member

mcollina commented Apr 26, 2021

I get some really fluctuating results even on my dedicated server - maybe we need to change something to how we benchmark.

These are the latest numbers I got.

master

[bench:server]
[bench:server] > undici@4.0.0-alpha.4 bench:server /home/matteo/repositories/undici
[bench:server] > node benchmarks/server.js
[bench:server]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 prebench:run /home/matteo/repositories/undici
[bench:run] > node benchmarks/wait.js
[bench:run]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 bench:run /home/matteo/repositories/undici
[bench:run] > CONNECTIONS=1 node benchmarks/benchmark.js && CONNECTIONS=50 node benchmarks/benchmark.js
[bench:run]
[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  4.80 req/sec │  ± 2.44 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      15 │  4.85 req/sec │  ± 2.54 % │                + 1.09 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      25 │ 61.27 req/sec │  ± 2.78 % │             + 1176.16 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      15 │ 62.24 req/sec │  ± 2.34 % │             + 1196.29 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 64.20 req/sec │  ± 2.54 % │             + 1237.15 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 65.26 req/sec │  ± 1.18 % │             + 1259.28 % │
[bench:run]
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  4914.01 req/sec │  ± 2.23 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  5753.65 req/sec │  ± 2.90 % │               + 17.09 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      30 │  8629.90 req/sec │  ± 2.63 % │               + 75.62 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      55 │  9595.80 req/sec │  ± 2.96 % │               + 95.27 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │  9831.22 req/sec │  ± 2.71 % │              + 100.07 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      35 │ 10863.22 req/sec │  ± 2.85 % │              + 121.07 % │
[bench:run]

simd

[bench:server]
[bench:server] > undici@4.0.0-alpha.4 bench:server /home/matteo/repositories/undici
[bench:server] > node benchmarks/server.js
[bench:server]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 prebench:run /home/matteo/repositories/undici
[bench:run] > node benchmarks/wait.js
[bench:run]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 bench:run /home/matteo/repositories/undici
[bench:run] > CONNECTIONS=1 node benchmarks/benchmark.js && CONNECTIONS=50 node benchmarks/benchmark.js
[bench:run]
[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      15 │  4.63 req/sec │  ± 2.77 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  4.81 req/sec │  ± 2.16 % │                + 3.94 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      25 │ 62.22 req/sec │  ± 2.67 % │             + 1244.58 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      15 │ 64.33 req/sec │  ± 2.47 % │             + 1290.24 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      15 │ 66.08 req/sec │  ± 2.48 % │             + 1327.88 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      10 │ 66.13 req/sec │  ± 1.39 % │             + 1329.08 % │
[bench:run]
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      50 │  3546.49 req/sec │  ± 2.90 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      15 │  5692.67 req/sec │  ± 2.48 % │               + 60.52 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      25 │  8478.71 req/sec │  ± 2.62 % │              + 139.07 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      20 │  9766.66 req/sec │  ± 2.79 % │              + 175.39 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │ 10109.74 req/sec │  ± 2.94 % │              + 185.06 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      25 │ 10949.73 req/sec │  ± 2.54 % │              + 208.75 % │

@ronag
Copy link
Member Author

ronag commented Apr 26, 2021

I get some really fluctuating results even on my dedicated server - maybe we need to change something to how we benchmark.

Yes. Unsure what though...

It's also weird how 50 connections makes it 150x faster?

But I think we can conclude that simd makes it faster?

@mcollina
Copy link
Member

simd makes it faster by roughly ~10% across various runs.

@ronag
Copy link
Member Author

ronag commented Apr 26, 2021

Regarding the benchmarks I'm also confused to why the difference between single connection and 50 connections is so large...

@mcollina
Copy link
Member

Regarding the benchmarks I'm also confused to why the difference between single connection and 50 connections is so large...

me too

README.md Show resolved Hide resolved
@dnlup dnlup marked this pull request as ready for review April 27, 2021 07:19
@dnlup
Copy link
Contributor

dnlup commented Apr 27, 2021

I found this looking for the error that pops up in Node 12. Not sure if it's fixable, maybe we should fallback to the non-simd version in that case too.

@ronag
Copy link
Member Author

ronag commented Apr 27, 2021

Ci fails

@dnlup
Copy link
Contributor

dnlup commented Apr 27, 2021

Ci fails

Yes, it's because of that error I linked. I think I am going to fallback to the non-wasm build in that case too.

README.md Outdated Show resolved Hide resolved
dnlup and others added 2 commits April 27, 2021 10:35
package.json Outdated Show resolved Hide resolved
@ronag ronag merged commit bf04793 into main Apr 27, 2021
@trivikr
Copy link
Member

trivikr commented May 10, 2021

simd makes it faster by roughly ~10% across various runs.

I too noticed the same ~10% improvement with SIMD in my runs while working on PR #796 with Node.js v16.1.0

@trivikr
Copy link
Member

trivikr commented May 10, 2021

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

Verified that WebAssembly SIMD support is available by default from Chrome 91 as per Enabling experimental SIMD support in Chrome.

Should have a new GitHub issue to track removal of --experimental-wasm-simd (and SIMD benchmark scripts) once Node.js 16.x is shipped with V8 9.1 in nodejs/node#38273?

The flag --experimental-wasm-simd seems to have been added in V8 7.4 based on this tweet and chrome releases in Feb 2019. The Node.js 12.x shipped with V8 7.4, so retaining SIMD and non-SIMD benchmarks would still be beneficial. May be SIMD benchmarks can be made default?

WDYT @ronag @mcollina?

@dnlup
Copy link
Contributor

dnlup commented May 10, 2021

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

Verified that WebAssembly SIMD support is available by default from Chrome 91 as per Enabling experimental SIMD support in Chrome.

Should have a new GitHub issue to track removal of --experimental-wasm-simd (and SIMD benchmark scripts) once Node.js 16.x is shipped with V8 9.1 in nodejs/node#38273?

The flag --experimental-wasm-simd seems to have been added in V8 7.4 based on this tweet and chrome releases in Feb 2019. The Node.js 12.x shipped with V8 7.4, so retaining SIMD and non-SIMD benchmarks would still be beneficial. May be SIMD benchmarks can be made default?

WDYT @ronag @mcollina?

If I am not mistaken we decided to keep simd as the default for the same reasons you have pointed out. The separate script was my fault, I must have forgotten to remove it. Sorry for the late feedback. I am fine with either approach, though; there are good reasons for each one.

@ronag
Copy link
Member Author

ronag commented May 10, 2021

Always simd.

@Uzlopak Uzlopak deleted the simd branch February 21, 2024 12:38
crysmags pushed a commit to crysmags/undici that referenced this pull request Feb 27, 2024
* perf: enable wasm simd

* bench: enable simd

* build: add wasm simd

* ci: use node 16 in benchmarks

* test: add simd test script

* ci: add simd bench

* enable simd by default in tests and benchmarks

* fix machine specs in README.md

Co-authored-by: Robert Nagy <ronagy@icloud.com>

* client: fallback to non-simd on all errors

* fixup: re-enable jest

* fixup

Co-authored-by: Daniele Belardi <dwon.dnl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants