perf: enable wasm simd #735

ronag · 2021-04-12T23:37:57Z

wasm simd is still experimental but looking at the performance could be interesting already.

ronag · 2021-04-12T23:38:27Z

@dnlup Would you be interested in trying to benchmark a wasm simd build?

mcollina

lgtm - hopefully this does not cause problems.

ronag · 2021-04-13T07:32:55Z

@mcollina I don't think we can merge this since it still requires a command line argument to enable experimental simd. Maybe Node 16? I'm just very interested in whether or not there are any performance gains.

mcollina · 2021-04-13T07:43:57Z

Ouch!

dnlup · 2021-04-13T07:49:53Z

I ran some benchmarks locally, but I couldn't see any noticeable differences atm. Not sure why benchmarks are failing, I did activate the command line option.

dnlup · 2021-04-13T07:50:50Z

https://github.com/nodejs/undici/pull/735/checks?check_run_id=2331229135

ronag · 2021-04-13T07:54:25Z

What node version are we using for benchmark CI?

dnlup · 2021-04-13T07:56:06Z

What node version are we using for benchmark CI?

Good catch, 12

https://github.com/nodejs/undici/blob/main/.github/workflows/bench.yml#L18

ronag · 2021-04-21T10:22:37Z

@dnlup would you mind running this with the latest benchark suite on master?

dnlup · 2021-04-21T11:09:51Z

The prebench:run script fails on Node 16, also on main.Not sure why atm.

ronag · 2021-04-24T16:35:18Z

@dnlup any update on perf numbers?

dnlup · 2021-04-24T17:45:04Z

Sorry, I was stuck trying to understand what is wrong with the CI. Running locally:

`Node`

v16.0.0

`main`

[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  6.97 req/sec │  ± 0.68 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  7.14 req/sec │  ± 0.72 % │                + 2.47 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      10 │ 56.27 req/sec │  ± 1.86 % │              + 707.64 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 69.13 req/sec │  ± 1.63 % │              + 892.26 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 69.18 req/sec │  ± 1.65 % │              + 892.99 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 69.20 req/sec │  ± 2.04 % │              + 893.27 % │
[bench:run] 
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      20 │  3502.83 req/sec │  ± 2.85 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      30 │  7250.33 req/sec │  ± 2.92 % │              + 106.98 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │     101 │  8809.81 req/sec │  ± 3.26 % │              + 151.51 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 13235.04 req/sec │  ± 2.94 % │              + 277.84 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │ 16281.19 req/sec │  ± 2.97 % │              + 364.80 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      30 │ 18726.61 req/sec │  ± 2.75 % │              + 434.61 % │

`simd`

[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  6.48 req/sec │  ± 0.55 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  6.61 req/sec │  ± 0.94 % │                + 2.07 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      20 │ 58.33 req/sec │  ± 2.50 % │              + 800.85 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 64.97 req/sec │  ± 2.15 % │              + 903.39 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 65.28 req/sec │  ± 0.97 % │              + 908.18 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 67.63 req/sec │  ± 0.79 % │              + 944.41 % │
[bench:run] 
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      35 │  3166.61 req/sec │  ± 2.84 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  6195.06 req/sec │  ± 2.54 % │               + 95.64 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │     101 │  9301.57 req/sec │  ± 3.00 % │              + 193.74 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      25 │ 15199.33 req/sec │  ± 2.99 % │              + 379.99 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      10 │ 17911.83 req/sec │  ± 2.69 % │              + 465.65 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      30 │ 21642.75 req/sec │  ± 2.70 % │              + 583.47 % │

I would say that there is a noticeable difference.

ronag · 2021-04-24T17:49:59Z

Is SIMD still experimental on Node 16?

ronag · 2021-04-24T18:00:47Z

@dnlup Can we ship a separate simd assembly and do a runtime check if it's available and load accordingly?

dnlup · 2021-04-24T18:25:18Z

Is SIMD still experimental on Node 16?

I haven't checked the docs, but removing the flag still crashes the script, so I think so.

dnlup · 2021-04-24T18:28:39Z

@dnlup Can we ship a separate simd assembly and do a runtime check if it's available and load accordingly?

We can modify the build script, yes. By available, is it ok to assume that Node has been launched with the cli option?

ronag · 2021-04-24T18:44:41Z

@dnlup maybe use https://github.com/GoogleChromeLabs/wasm-feature-detect?

ronag · 2021-04-24T18:45:24Z

Or alternatively. Always try to use the simd version first and if compilation fails then fallback to non simd?

dnlup · 2021-04-24T18:56:41Z

Or alternatively. Always try to use the simd version first and if compilation fails then fallback to non simd?

That sounds better

dnlup · 2021-04-24T18:57:53Z

@dnlup maybe use https://github.com/GoogleChromeLabs/wasm-feature-detect?

I like this one too. Are we ok with introducing it as a runtime dep?

ronag · 2021-04-24T19:01:02Z

@dnlup maybe use https://github.com/GoogleChromeLabs/wasm-feature-detect?

I like this one too. Are we ok with introducing it as a runtime dep?

I'm ok with it. But the feature detect is so trivial we might as well consider inlining it with a ref comment.

mcollina · 2021-04-24T19:06:38Z

I prefer not to add additional runtime dependencies

ronag · 2021-04-24T21:13:55Z

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

ronag · 2021-04-24T21:18:18Z

nodejs/node#38273

ronag · 2021-04-24T22:05:05Z

You can also update the README with updated bench info.

ronag · 2021-04-25T19:24:35Z

@mcollina PTAL. Also could you give us a bench run on this PR?

ronag · 2021-04-25T19:24:45Z

@dnlup is this ready for review?

dnlup · 2021-04-26T07:14:28Z

@dnlup is this ready for review?

I have a few doubts:

should we test and bench simd separately?
should we report the simd bench report separately in the README?
are we ok with adding another wasm build, which increases the final bundle size?

Other than that, we are almost ready for review.

As a note, benchmarks on the CI don't show any differences, unlike the ones I have run locally.

ronag · 2021-04-26T08:41:29Z

@dnlup is this ready for review?

I have a few doubts:

should we test and bench simd separately?

I think we can remove the simd versions of test and bench. I believe simd will be enable by default for Node 16.1 (or soonish). That way we will get it for free in CI.

should we report the simd bench report separately in the README?

No, I think we only report the simd bench with a note. This is the future.

are we ok with adding another wasm build, which increases the final bundle size?

I don't think it's a problem.

Other than that, we are almost ready for review.

As a note, benchmarks on the CI don't show any differences, unlike the ones I have run locally.

Maybe if @mcollina runs some benchmarks we can get a more accurate result?

mcollina · 2021-04-26T09:30:38Z

I get some really fluctuating results even on my dedicated server - maybe we need to change something to how we benchmark.

These are the latest numbers I got.

master

[bench:server]
[bench:server] > undici@4.0.0-alpha.4 bench:server /home/matteo/repositories/undici
[bench:server] > node benchmarks/server.js
[bench:server]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 prebench:run /home/matteo/repositories/undici
[bench:run] > node benchmarks/wait.js
[bench:run]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 bench:run /home/matteo/repositories/undici
[bench:run] > CONNECTIONS=1 node benchmarks/benchmark.js && CONNECTIONS=50 node benchmarks/benchmark.js
[bench:run]
[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  4.80 req/sec │  ± 2.44 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      15 │  4.85 req/sec │  ± 2.54 % │                + 1.09 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      25 │ 61.27 req/sec │  ± 2.78 % │             + 1176.16 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      15 │ 62.24 req/sec │  ± 2.34 % │             + 1196.29 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      10 │ 64.20 req/sec │  ± 2.54 % │             + 1237.15 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      10 │ 65.26 req/sec │  ± 1.18 % │             + 1259.28 % │
[bench:run]
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      10 │  4914.01 req/sec │  ± 2.23 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  5753.65 req/sec │  ± 2.90 % │               + 17.09 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      30 │  8629.90 req/sec │  ± 2.63 % │               + 75.62 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      55 │  9595.80 req/sec │  ± 2.96 % │               + 95.27 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │  9831.22 req/sec │  ± 2.71 % │              + 100.07 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      35 │ 10863.22 req/sec │  ± 2.85 % │              + 121.07 % │
[bench:run]

simd

[bench:server]
[bench:server] > undici@4.0.0-alpha.4 bench:server /home/matteo/repositories/undici
[bench:server] > node benchmarks/server.js
[bench:server]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 prebench:run /home/matteo/repositories/undici
[bench:run] > node benchmarks/wait.js
[bench:run]
[bench:run]
[bench:run] > undici@4.0.0-alpha.4 bench:run /home/matteo/repositories/undici
[bench:run] > CONNECTIONS=1 node benchmarks/benchmark.js && CONNECTIONS=50 node benchmarks/benchmark.js
[bench:run]
[bench:run] │ Tests               │ Samples │        Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      15 │  4.63 req/sec │  ± 2.77 % │                       - │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      10 │  4.81 req/sec │  ± 2.16 % │                + 3.94 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      25 │ 62.22 req/sec │  ± 2.67 % │             + 1244.58 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      15 │ 64.33 req/sec │  ± 2.47 % │             + 1290.24 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      15 │ 66.08 req/sec │  ± 2.48 % │             + 1327.88 % │
[bench:run] |─────────────────────|─────────|───────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      10 │ 66.13 req/sec │  ± 1.39 % │             + 1329.08 % │
[bench:run]
[bench:run] │ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │      50 │  3546.49 req/sec │  ± 2.90 % │                       - │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │      15 │  5692.67 req/sec │  ± 2.48 % │               + 60.52 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │      25 │  8478.71 req/sec │  ± 2.62 % │              + 139.07 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │      20 │  9766.66 req/sec │  ± 2.79 % │              + 175.39 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │      15 │ 10109.74 req/sec │  ± 2.94 % │              + 185.06 % │
[bench:run] |─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │      25 │ 10949.73 req/sec │  ± 2.54 % │              + 208.75 % │

ronag · 2021-04-26T09:37:38Z

I get some really fluctuating results even on my dedicated server - maybe we need to change something to how we benchmark.

Yes. Unsure what though...

It's also weird how 50 connections makes it 150x faster?

But I think we can conclude that simd makes it faster?

mcollina · 2021-04-26T09:39:43Z

simd makes it faster by roughly ~10% across various runs.

ronag · 2021-04-26T11:36:04Z

Regarding the benchmarks I'm also confused to why the difference between single connection and 50 connections is so large...

mcollina · 2021-04-26T14:04:26Z

Regarding the benchmarks I'm also confused to why the difference between single connection and 50 connections is so large...

me too

README.md

dnlup · 2021-04-27T07:24:02Z

I found this looking for the error that pops up in Node 12. Not sure if it's fixable, maybe we should fallback to the non-simd version in that case too.

ronag · 2021-04-27T07:37:26Z

Ci fails

dnlup · 2021-04-27T07:55:23Z

Ci fails

Yes, it's because of that error I linked. I think I am going to fallback to the non-wasm build in that case too.

README.md

Co-authored-by: Robert Nagy <ronagy@icloud.com>

package.json

trivikr · 2021-05-10T02:12:07Z

simd makes it faster by roughly ~10% across various runs.

I too noticed the same ~10% improvement with SIMD in my runs while working on PR #796 with Node.js v16.1.0

trivikr · 2021-05-10T02:46:00Z

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

Verified that WebAssembly SIMD support is available by default from Chrome 91 as per Enabling experimental SIMD support in Chrome.

Should have a new GitHub issue to track removal of --experimental-wasm-simd (and SIMD benchmark scripts) once Node.js 16.x is shipped with V8 9.1 in nodejs/node#38273?

The flag --experimental-wasm-simd seems to have been added in V8 7.4 based on this tweet and chrome releases in Feb 2019. The Node.js 12.x shipped with V8 7.4, so retaining SIMD and non-SIMD benchmarks would still be beneficial. May be SIMD benchmarks can be made default?

WDYT @ronag @mcollina?

dnlup · 2021-05-10T06:47:54Z

I think V8 9.1 no longer requires the --experimental-wasm-simd flag so I think this is absolutely worth to try to land in near time.

Verified that WebAssembly SIMD support is available by default from Chrome 91 as per Enabling experimental SIMD support in Chrome.

Should have a new GitHub issue to track removal of --experimental-wasm-simd (and SIMD benchmark scripts) once Node.js 16.x is shipped with V8 9.1 in nodejs/node#38273?

The flag --experimental-wasm-simd seems to have been added in V8 7.4 based on this tweet and chrome releases in Feb 2019. The Node.js 12.x shipped with V8 7.4, so retaining SIMD and non-SIMD benchmarks would still be beneficial. May be SIMD benchmarks can be made default?

WDYT @ronag @mcollina?

If I am not mistaken we decided to keep simd as the default for the same reasons you have pointed out. The separate script was my fault, I must have forgotten to remove it. Sorry for the late feedback. I am fine with either approach, though; there are good reasons for each one.

ronag · 2021-05-10T09:08:31Z

Always simd.

* perf: enable wasm simd * bench: enable simd * build: add wasm simd * ci: use node 16 in benchmarks * test: add simd test script * ci: add simd bench * enable simd by default in tests and benchmarks * fix machine specs in README.md Co-authored-by: Robert Nagy <ronagy@icloud.com> * client: fallback to non-simd on all errors * fixup: re-enable jest * fixup Co-authored-by: Daniele Belardi <dwon.dnl@gmail.com>

ronag requested a review from dnlup April 12, 2021 23:37

mcollina approved these changes Apr 13, 2021

View reviewed changes

dnlup force-pushed the simd branch from a6a7700 to 869fcfa Compare April 21, 2021 11:06

ronag added this to the 4.0 milestone Apr 24, 2021

ronag and others added 2 commits April 25, 2021 10:26

perf: enable wasm simd

af4b8a4

bench: enable simd

a7774e9

enable simd by default in tests and benchmarks

a9ded9c

dnlup force-pushed the simd branch from 5bb053d to a9ded9c Compare April 27, 2021 07:17

dnlup reviewed Apr 27, 2021

View reviewed changes

README.md Show resolved Hide resolved

dnlup marked this pull request as ready for review April 27, 2021 07:19

ronag commented Apr 27, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

dnlup and others added 2 commits April 27, 2021 10:35

fix machine specs in README.md

e49d5a4

Co-authored-by: Robert Nagy <ronagy@icloud.com>

client: fallback to non-simd on all errors

35cdf38

ronag commented Apr 27, 2021

View reviewed changes

package.json Outdated Show resolved Hide resolved

ronag added 2 commits April 27, 2021 11:23

fixup: re-enable jest

efe07ba

fixup

5356089

ronag merged commit bf04793 into main Apr 27, 2021

trivikr mentioned this pull request May 11, 2021

chore: remove bench:simd script #802

Merged

Uzlopak deleted the simd branch February 21, 2024 12:38

perf: enable wasm simd #735

perf: enable wasm simd #735

Conversation

ronag commented Apr 12, 2021

ronag commented Apr 12, 2021

mcollina left a comment

Choose a reason for hiding this comment

ronag commented Apr 13, 2021 • edited

mcollina commented Apr 13, 2021

dnlup commented Apr 13, 2021

dnlup commented Apr 13, 2021

ronag commented Apr 13, 2021

dnlup commented Apr 13, 2021

ronag commented Apr 21, 2021

dnlup commented Apr 21, 2021

ronag commented Apr 24, 2021

dnlup commented Apr 24, 2021

Node

main

simd

ronag commented Apr 24, 2021

ronag commented Apr 24, 2021

dnlup commented Apr 24, 2021

dnlup commented Apr 24, 2021

ronag commented Apr 24, 2021

ronag commented Apr 24, 2021

dnlup commented Apr 24, 2021

dnlup commented Apr 24, 2021

ronag commented Apr 24, 2021

mcollina commented Apr 24, 2021

ronag commented Apr 24, 2021

ronag commented Apr 24, 2021

ronag commented Apr 24, 2021

ronag commented Apr 25, 2021

ronag commented Apr 25, 2021

dnlup commented Apr 26, 2021

ronag commented Apr 26, 2021

mcollina commented Apr 26, 2021 • edited

ronag commented Apr 26, 2021

mcollina commented Apr 26, 2021

ronag commented Apr 26, 2021

mcollina commented Apr 26, 2021

dnlup commented Apr 27, 2021

ronag commented Apr 27, 2021

dnlup commented Apr 27, 2021

trivikr commented May 10, 2021 • edited

trivikr commented May 10, 2021 • edited

dnlup commented May 10, 2021

ronag commented May 10, 2021

ronag commented Apr 13, 2021 •

edited

`Node`

`main`

`simd`

mcollina commented Apr 26, 2021 •

edited

trivikr commented May 10, 2021 •

edited

trivikr commented May 10, 2021 •

edited