fetch() performance parity #1203

ignoramous · 2022-02-04T09:40:34Z

This would solve...

Undici fetch isn't as fast as the ones in browsers like Firefox / Deno / Workers.

The implementation should look like...

Measure and improve (e2e latencies, op/sec, conns/sec, etc?) to match up with other impls.

I have also considered...

I hand-rolled a http2 nodejs stdlib client and it was a magnitude faster than undici fetch, even without connection pooling.

I must note though, the conns were to Cloudflare and Google endpoints, which responded with 1KB or less per fetch (basically, POST and uncached GET DNS over HTTPS connections).

Additional context

This is a follow up from: https://news.ycombinator.com/item?id=30164925

Sorry that I didn't really benchmark the difference between Deno / Firefox / Undici / Cloudflare Workers, but I knew instantly that it was the cause of higher latencies when I deployed the same code in Node v16 LTS.

Possibly related: #22, #208, #952, #902

Other projects: szmarczak/http2-wrapper/issues/71

The text was updated successfully, but these errors were encountered:

ronag · 2022-02-04T09:44:56Z

Do you have a reproducible benchmark we can use?

mcollina · 2022-02-05T10:17:45Z

Undici fetch isn't as fast as the ones in browsers like Firefox / Deno / Workers.

I took a look at the comment in hackernews and the comment was not including benchmarks either.

I'm prone to close this unless somebody would like to prepare a cross-runtime benchmark suite.

ignoramous · 2022-02-05T17:07:46Z

Fair. I'll take a stab at extracting out the parts from my server to demonstrate this. I have to note that the slowness I observed was in a production env (running in a VM in a container behind a network load balancer) that also sees a tonne of other IO.

ignoramous · 2022-02-06T11:37:32Z

I wrote up a tiny DoH load-tester with python and golang (don't ask) which I intend to publish as a github action by tomorrow. In the meanwhile, here's a run from my laptop:

deno/runs:3748
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	280	296	309	336	411	694	889	898	929

undici/runs:3826
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	1140	1666	2171	2811	3302	3941	4501	4540	4995

node:naive-h2/runs:3791
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	1713	2067	2570	3276	3975	4269	4336	4530	4534

ignoramous · 2022-02-08T20:13:25Z

After fighting a bit with github-actions, a poor man's profiler on is finally up:

From the current run, here's Node:

node/runs:1200
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	532	1215	1665	3359	3469	20953	21094	21125	21182

here's Deno:

deno/runs:6300
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	87	97	104	111	116	127	165	184	189

These are measured from Date.now wrapping around the fetch call. Both local and remote caches are unlikely to interfere because every DoH query is of form <random-str>.dnsleaktest.com.

Anything in the serverless-dns code-base that may significantly interfere with timing fetch calls is disabled.

The profiler forwards as many DoH requests it can for 50s to 60s, after which it quits.

Deno's serveDoh doesn't slow down at all, and blazes through 6300+ requests. Node's serverHTTPS (+ undici), otoh, manages 1200+ requests. I must note that the final batch of 100 queries served after the ~1100th request, took 20s+ to complete.

Just a note: I haven't used Node's built-in perf-module to profile since that is un-importable on Deno (from what I can tell)... hand-rolled a minimal and inefficient perf-counter on my own, instead.

mcollina · 2022-02-08T22:11:20Z

I might be a bit lost on your codebase, but I could not find undici being used anywhere.

I would note that there are plenty of things happening on that module. How have you identified the problem in undici fetch implementation?

mcollina · 2022-02-08T22:32:02Z

If by any chance you are testing the http/2 implementation, consider that https://github.com/serverless-dns/serverless-dns/blob/b36a9cdb5786b78a7dd8e22a2f0a767e9e340ab1/src/plugins/dns-operation/dnsResolver.js#L398 is highly inefficient as you will be incurring in significant overhead of creating those http/2 connections. You have to use a connection pool.

ignoramous · 2022-02-08T23:19:57Z

I might be a bit lost on your codebase, but I could not find undici being used anywhere.

Dang, we moved to node-fetch (to see if it was any better) and didn't revert back to undici (which we now have).

Here's results from Node + undici run:

node/undici/runs:1300
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	200	602	965	1121	3083	12267	20400	20407	20467

How have you identified the problem in undici fetch implementation?

What's measured is just this method dnsResolver.js:resolveDnsUpstream, which doesn't do much other than construct a Request and call cache-api (does not exist on Node, disabled either way on these "profiler" runs), fetch (over undici on Node), or doh2 (unused).

We then rolled our own DNS over UDP / DNS over TCP connection-pooled transport atop Node (dgram and net), and that showed considerable perf improvement (around 15x to 100x):

node/udp/runs:6600
percentile	p10	p50	p75	p90	p95	p99	p99.9	p99.99	p100
milliseconds	17	18	19	20	21	25	34	89	131

Of course, a standalone benchmark would be great to have, but I wanted to demonstrate the issue I spot with the kind of workloads we specifically run (lots and lots of DoH requests), and that, by bypassing undici / node-fetch polyfills (or running on cloudflare-workers or deno which bring their own fetch impl to the table) from the code-path meant apparent improvements in IO.

mcollina · 2022-02-08T23:52:45Z

I don't think anyone had performed an optimization of either fetch or node WHATWG Streams implementation, so this is a good point and there is likely a lot of margin there.

Have you tried using undici.request() instead?

Are there some instructions on how to run the benchmarks? Being able to get some good diagnostics would definitely help.

mcollina · 2022-02-09T00:12:37Z

Here is a bit of tests from our own benchmarks:

│ Tests               │ Samples │           Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - fetch      │      20 │  1028.31 req/sec │  ± 2.71 % │                       - │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ http - no keepalive │      10 │  3891.51 req/sec │  ± 2.00 % │              + 278.44 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - pipeline   │      95 │  6034.47 req/sec │  ± 2.95 % │              + 486.83 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ http - keepalive    │      50 │  6382.57 req/sec │  ± 2.98 % │              + 520.68 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - request    │      15 │  8528.35 req/sec │  ± 2.11 % │              + 729.35 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - stream     │      10 │ 10226.33 req/sec │  ± 2.66 % │              + 894.48 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - dispatch   │      50 │ 11399.31 req/sec │  ± 2.79 % │             + 1008.54 % │

There is likely a lot of room for improvements.

ignoramous · 2022-02-09T00:15:23Z

Have you tried using undici.request() instead?

I will do so over the coming days and report back.

Are there some instructions on how to run the benchmarks?

Apart from running the profiler.yml gh-action (of your own fork of serverless-dns), you could checkout the codebase, and then

# clone serverless-dns
git clone https://github.com/serverless-dns/serverless-dns.git

# download deps
npm i

# install q, a golang doh client
# https://github.com/natesales/q
echo "deb [trusted=yes] https://repo.natesales.net/apt /" | sudo tee /etc/apt/sources.list.d/natesales.list > /dev/null
sudo apt-get update
sudo apt-get install q

# export QDOH, where the 'run' script goes looking for `q`
export QDOH=q

# install node v15+ / deno v1.17+, if missing
...

# doh query, <random-str>.dnsleaktest.com, is sent to cloudflare-dns.com
# run doh, on node + undici
timeout 60s ./run node p1

# doh query, <random-str>.dnsleaktest.com, is sent to cloudflare-dns.com
# run doh, on deno
timeout 60s ./run deno p1

# dns query, <random-str>.dnsleaktest.com, is sent to 1.1.1.1
# run udp/tcp dns, on node
timeout 60s ./run node p3

Here is a bit of tests from our own benchmarks:

Thanks. Sorry, if this is a noob question, but: how do I generate such logs myself for my tests?

mcollina · 2022-02-09T00:30:50Z

Thanks. Sorry, if this is a noob question, but: how do I generate such logs myself for my tests?

Check out #1214 for the fetch benchmarks.

You can run our benchmarks by running:

PORT=3001 node benchmarks/server.js

PORT=3001 node benchmarks/benchmarks.js

RafaelGSS · 2022-04-17T23:39:56Z

Hello folks! Recently, I spent some time working on the fetch performance investigation and I decided to create a report about my findings. I'll share an abstract (probably the full report will become a blog post at some moment)

TL;DR:

After some analysis, I came up with a few PRs improving fetch performance directly:

However, the major bottleneck is under WebStreams implementation in Node.js core, thus,
I don't think there is a clear path for undici.fetch to improve its performance significantly without touching Node.js internals.

Firstly, the benchmark was unfair with undici.fetch since it was allocating the server response in a variable causing an event-loop block due to GC runs. Then, I solved it in: #1283.

After several analyses, I have found that WebStreams is very slow (through microbenchmark and by fetch benchmark) - For reference, see https://github.com/RafaelGSS/nodejs-bench-operations/blob/main/RESULTS-v17.md#stream-readablejs. Thus, I did a comparison replacing WebStreams with native Streams and I got 60% ~ 90% of improvement.

Avoiding WebStreams in response

   diff --git a/lib/fetch/index.js b/lib/fetch/index.js
   index 9047976..778ad82 100644
   --- a/lib/fetch/index.js
   +++ b/lib/fetch/index.js
   @@ -50,6 +50,7 @@ const {
    const { kHeadersList } = require('../core/symbols')
    const EE = require('events')
    const { Readable, pipeline } = require('stream')
   +const Readable2 = require('../api/readable')
    const { isErrored, isReadable } = require('../core/util')
    const { dataURLProcessor } = require('./dataURL')
    const { kIsMockActive } = require('../mock/mock-symbols')
   @@ -964,7 +965,7 @@ async function fetchFinale (fetchParams, response) {
        })
    
        // 4. Set response’s body to the result of piping response’s body through transformStream.
   -    response.body = { stream: response.body.stream.pipeThrough(transformStream) }
   +    // response.body = { stream: response.body.stream.pipeThrough(transformStream) }
      }
    
      // 6. If fetchParams’s process response consume body is non-null, then:
   @@ -1731,9 +1732,8 @@ async function httpNetworkFetch (
        })()
      }
    
   +  const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
      try {
   -    const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
   -
        const iterator = body[Symbol.asyncIterator]()
        fetchParams.controller.next = () => iterator.next()
    
   @@ -1797,7 +1797,7 @@ async function httpNetworkFetch (
      // 17. Run these steps, but abort when the ongoing fetch is terminated:
    
      //    1. Set response’s body to a new body whose stream is stream.
   -  response.body = { stream }
   +  response.body = { stream: body }
    
      //    2. If response is not a network error and request’s cache mode is
      //    not "no-store", then update response in httpCache for request.
   @@ -1957,7 +1957,7 @@ async function httpNetworkFetch (
                headers.append(key, val)
              }
    
   -          this.body = new Readable({ read: resume })
   +          this.body = new Readable2(resume, this.abort, headers.get('content-type'))
    
              const decoders = []

If you apply the above git diff and profile the application, it will show createAbortError as one of the functions wasting more time on the CPU.

And, these AbortSignal are mostly created in WebStreams instance (even if the instance is not in use), e.g:

So, avoiding the AbortSignal creation through WebStreams bumps, even more, the performance (150% ~ 200% -- based on the initial benchmarks)

Avoiding AbortSignal

diff --git a/lib/fetch/index.js b/lib/fetch/index.js
index 1fbf29b..322e5ae 100644
--- a/lib/fetch/index.js
+++ b/lib/fetch/index.js
@@ -50,6 +50,7 @@ const {
const { kHeadersList } = require('../core/symbols')
const EE = require('events')
const { Readable, pipeline } = require('stream')
+const Readable2 = require('../api/readable')
const { isErrored, isReadable } = require('../core/util')
const { dataURLProcessor } = require('./dataURL')
const { kIsMockActive } = require('../mock/mock-symbols')
@@ -957,14 +958,14 @@ async function fetchFinale (fetchParams, response) {

    // 3. Set up transformStream with transformAlgorithm set to identityTransformAlgorithm
    // and flushAlgorithm set to processResponseEndOfBody.
-    const transformStream = new TransformStream({
-      start () {},
-      transform: identityTransformAlgorithm,
-      flush: processResponseEndOfBody
-    })
+    // const transformStream = new TransformStream({
+    //   start () {},
+    //   transform: identityTransformAlgorithm,
+    //   flush: processResponseEndOfBody
+    // })

    // 4. Set response’s body to the result of piping response’s body through transformStream.
-    response.body = { stream: response.body.stream.pipeThrough(transformStream) }
+    // response.body = { stream: response.body.stream.pipeThrough(transformStream) }
  }

  // 6. If fetchParams’s process response consume body is non-null, then:
@@ -1731,9 +1732,8 @@ async function httpNetworkFetch (
    })()
  }

+  const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
  try {
-    const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
-
    const iterator = body[Symbol.asyncIterator]()
    fetchParams.controller.next = () => iterator.next()

@@ -1775,29 +1775,29 @@ async function httpNetworkFetch (
  // 16. Set up stream with pullAlgorithm set to pullAlgorithm,
  // cancelAlgorithm set to cancelAlgorithm, highWaterMark set to
  // highWaterMark, and sizeAlgorithm set to sizeAlgorithm.
-  if (!ReadableStream) {
-    ReadableStream = require('stream/web').ReadableStream
-  }
-
-  const stream = new ReadableStream(
-    {
-      async start (controller) {
-        fetchParams.controller.controller = controller
-      },
-      async pull (controller) {
-        await pullAlgorithm(controller)
-      },
-      async cancel (reason) {
-        await cancelAlgorithm(reason)
-      }
-    },
-    { highWaterMark: 0 }
-  )
+  // if (!ReadableStream) {
+  //   ReadableStream = require('stream/web').ReadableStream
+  // }
+
+  // const stream = new ReadableStream(
+  //   {
+  //     async start (controller) {
+  //       fetchParams.controller.controller = controller
+  //     },
+  //     async pull (controller) {
+  //       await pullAlgorithm(controller)
+  //     },
+  //     async cancel (reason) {
+  //       await cancelAlgorithm(reason)
+  //     }
+  //   },
+  //   { highWaterMark: 0 }
+  // )

  // 17. Run these steps, but abort when the ongoing fetch is terminated:

  //    1. Set response’s body to a new body whose stream is stream.
-  response.body = { stream }
+  response.body = { stream: body }

  //    2. If response is not a network error and request’s cache mode is
  //    not "no-store", then update response in httpCache for request.
@@ -1870,10 +1870,10 @@ async function httpNetworkFetch (
      fetchParams.controller.controller.enqueue(new Uint8Array(bytes))

      // 8. If stream is errored, then terminate the ongoing fetch.
-      if (isErrored(stream)) {
-        fetchParams.controller.terminate()
-        return
-      }
+      // if (isErrored(stream)) {
+      //   fetchParams.controller.terminate()
+      //   return
+      // }

      // 9. If stream doesn’t need more data ask the user agent to suspend
      // the ongoing fetch.
@@ -1891,16 +1891,16 @@ async function httpNetworkFetch (
      response.aborted = true

      // 2. If stream is readable, error stream with an "AbortError" DOMException.
-      if (isReadable(stream)) {
-        fetchParams.controller.controller.error(new AbortError())
-      }
+      // if (isReadable(stream)) {
+      //   fetchParams.controller.controller.error(new AbortError())
+      // }
    } else {
      // 3. Otherwise, if stream is readable, error stream with a TypeError.
-      if (isReadable(stream)) {
-        fetchParams.controller.controller.error(new TypeError('terminated', {
-          cause: reason instanceof Error ? reason : undefined
-        }))
-      }
+      // if (isReadable(stream)) {
+      //   fetchParams.controller.controller.error(new TypeError('terminated', {
+      //     cause: reason instanceof Error ? reason : undefined
+      //   }))
+      // }
    }

    // 4. If connection uses HTTP/2, then transmit an RST_STREAM frame.
@@ -1957,7 +1957,7 @@ async function httpNetworkFetch (
            headers.append(key, val)
          }

-          this.body = new Readable({ read: resume })
+          this.body = new Readable2(resume, this.abort, headers.get('content-type'))

          const decoders = []

Then, I came up with avoiding AbortController (that creates the AbortSignal under the hood), and impressively, it bumped the performance by 20%!

Avoiding AbortController

diff --git a/lib/fetch/request.js b/lib/fetch/request.js
index 0f10e67..ae9c37c 100644
--- a/lib/fetch/request.js
+++ b/lib/fetch/request.js
@@ -29,9 +29,9 @@ let TransformStream

const kInit = Symbol('init')

-const requestFinalizer = new FinalizationRegistry(({ signal, abort }) => {
-  signal.removeEventListener('abort', abort)
-})
+// const requestFinalizer = new FinalizationRegistry(({ signal, abort }) => {
+//   signal.removeEventListener('abort', abort)
+// })

// https://fetch.spec.whatwg.org/#request-class
class Request {
@@ -355,7 +355,7 @@ class Request {

    // 28. Set this’s signal to a new AbortSignal object with this’s relevant
    // Realm.
-    const ac = new AbortController()
+    const ac = { signal: { addEventListener: () => {} } }
    this[kSignal] = ac.signal
    this[kSignal][kRealm] = this[kRealm]

@@ -376,7 +376,7 @@ class Request {
      } else {
        const abort = () => ac.abort()
        signal.addEventListener('abort', abort, { once: true })
-        requestFinalizer.register(this, { signal, abort })
+        // requestFinalizer.register(this, { signal, abort })
      }
    }

@@ -741,7 +741,8 @@ class Request {
    clonedRequestObject[kHeaders][kRealm] = this[kHeaders][kRealm]

    // 4. Make clonedRequestObject’s signal follow this’s signal.
-    const ac = new AbortController()
+    const ac = { signal: { addEventListener: () => {}, abort: () => {} } }
+    // const ac = new AbortController()
    if (this.signal.aborted) {
      ac.abort()
    } else {

Additional Notes

In the first iteration, Clinic Doctor report says something is wrong with Active Handles

The Active Handles are dropping often, which is far from expected. For comparison reasons, look at the undici.request graph:

There is probably a bad configuration out there and I have found that undici.fetch is dropping many packages during the benchmark causing TCP RETRANSMISSION.

Basically, when a package is dropped in an active connection a re-transmission protocol is sent and when it happens frequently, the performance is affected.

TCP Retransmission using undici.fetch

root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
LADDR:LPORT               RADDR:RPORT               RETRANSMITS
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52390         99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52400         99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52398         99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52402         99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52414        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52424        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52412        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52392        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52420        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52396        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52422        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52404        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52410        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52428        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52418        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52408        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52426        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52416        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52406        100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52394        101

For comparison reasons, looks the retransmissions using

http.request (keep alive)

root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
LADDR:LPORT               RADDR:RPORT               RETRANSMITS
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52526          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52532          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52506          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52514          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52524          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52512          1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52518          1

undici.request

root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
EMPTY
LADDR:LPORT               RADDR:RPORT               RETRANSMITS

Unfortunately, I haven't found out why it's happening. I tried, I swear :P.

ronag · 2022-04-18T10:03:48Z

@benjamingr why is creating abort signals so expensive?

devsnek · 2022-04-18T14:15:29Z

I would guess it is because createAbortSignal+makeTransferable does a lot of prototype mutation stuff and a call into c++ which is basically the bane of js engine performance.

i think we could move the logic of createAbortSignal into the AbortSignal constructor, sorta like this:

const kDoThing = Symbol('kDoThing');
function createAbortSignal(aborted, reason) {
  return new AbortSignal(aborted, reason, kDoThing);
}

class AbortSignal extends EventTarget {
  constructor(aborted, reason, thing) {
    if (thing !== kDoThing) throw new ERR_ILLEGAL_CONSTRUCTOR();
    super();
    this[kAborted] = aborted;
    this[kReason] = reason;
  }
}

the makeTransferable part is harder cuz c++ but i feel like we could at least make it more static than what makeTransferable is doing, similar to what i did above. also seems weird to me that JSTransferable needs to exist at all, like we already have the api with kTransfer and kDeserialize and whatnot but I don't know all the specifics there.

ronag · 2022-04-18T14:46:54Z

@jasnell @addaleax

RafaelGSS · 2022-04-18T14:50:29Z

@devsnek I didn't mention the analysis I did in Node.js Webstreams (planning to open an issue in Node.js any time soon), but indeed, makeTransferable has something on it for sure.

jasnell · 2022-04-18T15:10:08Z

makeTransferable is an ugly hack, unfortunately. I'd love to find a way to handle it more elegantly but we are fairly strictly limited by the way the structured clone algorithm works and the v8 implementation of it. I have some ideas on how to make improvements there.

also seems weird to me that JSTransferable needs to exist at all, like we already have the api with kTransfer and kDeserialize and whatnot but I don't know all the specifics there.

The way it works is this:

The structured clone algorithm only works by default with ordinary JavaScript objects with enumerable properties. Things like JavaScript classes are not supported. In order to make objects like WritableStream and ReadableStream consistently transferable per the spec is to treat them as "Host Objects". The JSTransferable class is exactly how we do that. (A Host object is a JavaScript object that is backed by a C++ class.).

When a Host object is passed into the structured clone algorithm implementation, v8 will call out to the serialization/deserialization delegate implementation that Node.js provides. Node.js' delegate implementation will check to see what kind of Host object it is. If the Host object is just a native c++ class that extends from BaseObject, then the c++ class will have functions on it for handling the serialization/deserialization at the native layer. If the Host object is a JSTransferable, that means that the serialization/deserialization is handled by JavaScript functions on the JavaScript wrapper object -- specifically, the [kTransfer], [kClone], [kDeserialize] functions. The delegate will as the JSTransferable host object to serialize itself, and it will do so by calling out to the JavaScript functions..... tl;dr: JSTransferable is how we allow a JavaScript class to act as a Host object, and the kTransfer/kDeserialize do the actual work of that.

But, we have a problem here. Classes like WritableStream and ReadableStream are defined by the standards as not extending from any other class and there are Web Platform Tests that verify that the implementation matches the Web IDL definition. Specifically, we can't have class WritableStream extends JSTransferable because that would violate the standard definition of WritableStream.

The makeTransferable() utility is a hack to get around that. Instead of creating a WritableStream that extends from JSTransferable, makeTransferable() creates an instance of JSTransferable and uses the WritableStream instance as its prototype. It effectively inverts the inheritance (that is, creates a JSTransferable that extends from WritableStream). This is a bit slow for several reasons that should be fairly obvious. Performance-wise, it's not great. But it allows us to implement the streams spec correctly.

devsnek · 2022-04-18T15:25:02Z

Looking at this more, it seems like we just need to teach v8 how to recognize non-c++-backed "host" objects, and then we wouldn't need to add the JSTransferable superclass.

oop thank you for the writeup james

jasnell · 2022-04-18T15:29:44Z

Yes, it would be ideal if the structured clone algorithm provided a way to allow arbitrary classes to define how they would be serialized / deserialized. I've raised this conversation with the WHATWG (whatwg/html#7428) before and, unfortunately, it received a pretty chilly response.

An approach based on well-known Symbols (e.g. Symbol.transfer, Symbol.clone, Symbol.deserialize) would be useful, where v8 would recognize that objects that implement these should be passed to the serializer/deserializer delegate. There's some complexity on the deserialization end around how to make the deserializer aware of what classes it needs to handle but that's largely implementation detail.

Unfortunately, without these changes at either the standard (tc-39, whatwg) level or v8 level, JSTransferable will be a necessary evil.

devsnek · 2022-04-18T15:31:40Z

I don't think we need to do anything at the standards level, if v8 just had an api to tag constructors with a private symbol or similar, it could forward them back to WriteHostObject when it sees them and then we could do whatever we want with them.

jasnell · 2022-04-18T15:32:49Z

Yeah, doing it entirely within v8 is possible. My concern there would be interop. I would much prefer the mechanism to be standardized so that there's a better hope that it would work consistently across runtimes.

jasnell · 2022-04-18T15:47:05Z

Within v8, the key limitation is that, inside v8's implementation, only C++-backed objects can be identified as host objects, and only host objects are passed into the delegate to be handled, and since a host object can be nested arbitrarily deep in object graphs, we have to depend on v8 identifying them as it walks the graph being serialized. If we can start there -- teaching v8 to recognize when a pure JavaScript object is to be treated as a host object -- then we've gone a long way towards solving the problem.

silverwind · 2023-01-20T10:48:21Z

I'm not sure using registry.npmjs.org as a benchmark endpoint is optimal for accurate benchmarking :)

Yeah it's definitely not ideal. I think they also rate-limit parallel connections. I have observed ECONNRESET when bumping the concurrency too high :)

ronag · 2023-01-20T11:02:52Z

@RafaelGSS TCP Retransmission might be related to tcpNoDelay. Have you tried disabling that?

BSN4 · 2023-01-20T11:08:40Z

waiting for 19.5.0 to test fetch result just saw @RafaelGSS pr 👍

RafaelGSS · 2023-01-20T13:52:48Z

@RafaelGSS TCP Retransmission might be related to tcpNoDelay. Have you tried disabling that?

That's a good point. I haven't.

BSN4 · 2023-01-20T21:38:52Z

using v20 nightly build I can see no diffrence between deno, and node fetch infact it looks better than deno .. but bun slightly faster ... good job guys

KhafraDev · 2023-02-21T19:57:54Z

fetch seems to have gotten much slower in v19.7.0 compared to v19.6.1

v19.6.1:

[bench:run] │ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - fetch      │       1 │  766.77 req/sec │  ± 0.00 % │                       - │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │       1 │ 1345.28 req/sec │  ± 0.00 % │               + 75.45 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │       1 │ 2246.10 req/sec │  ± 0.00 % │              + 192.93 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │       1 │ 2372.56 req/sec │  ± 0.00 % │              + 209.42 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │       1 │ 2522.22 req/sec │  ± 0.00 % │              + 228.94 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │       1 │ 3170.79 req/sec │  ± 0.00 % │              + 313.52 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │       1 │ 3273.41 req/sec │  ± 0.00 % │              + 326.91 % │

v19.7.0:

[bench:run] │ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - fetch      │       1 │  564.40 req/sec │  ± 0.00 % │                       - │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │       1 │ 1295.50 req/sec │  ± 0.00 % │              + 129.54 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │       1 │ 1959.11 req/sec │  ± 0.00 % │              + 247.12 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │       1 │ 2297.02 req/sec │  ± 0.00 % │              + 306.99 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │       1 │ 2624.22 req/sec │  ± 0.00 % │              + 364.96 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │       1 │ 2977.49 req/sec │  ± 0.00 % │              + 427.55 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │       1 │ 3473.19 req/sec │  ± 0.00 % │              + 515.38 % │

mcollina · 2023-02-21T20:08:13Z

cc @anonrig

anonrig · 2023-02-21T20:58:50Z

I don't think this is related to URL, but related to stream performance.

ReadableStream/WritableStream abort-signal might cause this regression. (nodejs/node#46273) cc @debadree25 @ronag

debadree25 · 2023-02-22T05:55:07Z

The abort signal connection feels unlikely in 19.7 none of the streams PRs seem to introduce any significant change inside WHATWYG streams all of it interop work, could be something else?

ronag · 2023-02-22T06:12:20Z

Can someone do a bisect?

KhafraDev · 2023-02-22T06:12:53Z

The likely candidate is the PR that made byte readable streams transferable. However undici doesn't use byte readable streams so I'm unsure how it would affect performance this drastically on my machine. Has anyone else ran the benchmarks on 19.6.1 vs 19.7.0?

debadree25 · 2023-02-22T06:44:02Z

Benchmarks look similar on my machine too, attempting to do the git bisect but will take some time 😅

debadree25 · 2023-02-22T07:45:51Z

The difference in results between v19.6.1 and v19.7.0 albeit is small

v19.6.1

[bench:run] │ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - fetch      │       1 │  809.35 req/sec │  ± 0.00 % │                       - │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │       1 │ 1244.84 req/sec │  ± 0.00 % │               + 53.81 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │       1 │ 1578.42 req/sec │  ± 0.00 % │               + 95.02 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │       1 │ 2047.72 req/sec │  ± 0.00 % │              + 153.01 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │       1 │ 2312.49 req/sec │  ± 0.00 % │              + 185.72 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │       1 │ 3077.30 req/sec │  ± 0.00 % │              + 280.22 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │       1 │ 3571.34 req/sec │  ± 0.00 % │              + 341.26 % │

v19.7.0

[bench:run] │ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - fetch      │       1 │  781.15 req/sec │  ± 0.00 % │                       - │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - no keepalive │       1 │ 1284.11 req/sec │  ± 0.00 % │               + 64.39 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ http - keepalive    │       1 │ 1593.15 req/sec │  ± 0.00 % │              + 103.95 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - pipeline   │       1 │ 2129.34 req/sec │  ± 0.00 % │              + 172.59 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - request    │       1 │ 2427.34 req/sec │  ± 0.00 % │              + 210.74 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - stream     │       1 │ 3134.43 req/sec │  ± 0.00 % │              + 301.26 % │
[bench:run] |─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
[bench:run] │ undici - dispatch   │       1 │ 3457.92 req/sec │  ± 0.00 % │              + 342.67 % │
[bench:run]

legendecas · 2023-07-05T08:10:54Z

Refactoring JS transferable to move away from the prototype mutation: nodejs/node#47956. Reviews welcomed.

H4ad · 2023-07-07T19:47:37Z

NodeJS 20.3.0:

│ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - fetch      │       1 │  510.90 req/sec │  ± 0.00 % │                       - │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - no keepalive │       1 │ 1125.31 req/sec │  ± 0.00 % │              + 120.26 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - request    │       1 │ 1475.09 req/sec │  ± 0.00 % │              + 188.72 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - pipeline   │       1 │ 1521.68 req/sec │  ± 0.00 % │              + 197.84 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - keepalive    │       1 │ 1599.12 req/sec │  ± 0.00 % │              + 213.00 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - stream     │       1 │ 2476.90 req/sec │  ± 0.00 % │              + 384.81 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - dispatch   │       1 │ 2792.84 req/sec │  ± 0.00 % │              + 446.65 % │

After nodejs/node#47956:

│ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - fetch      │       1 │  644.45 req/sec │  ± 0.00 % │                       - │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - no keepalive │       1 │ 1153.57 req/sec │  ± 0.00 % │               + 79.00 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - keepalive    │       1 │ 1354.32 req/sec │  ± 0.00 % │              + 110.15 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - request    │       1 │ 1602.05 req/sec │  ± 0.00 % │              + 148.59 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - pipeline   │       1 │ 1701.39 req/sec │  ± 0.00 % │              + 164.01 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - stream     │       1 │ 2331.69 req/sec │  ± 0.00 % │              + 261.81 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - dispatch   │       1 │ 2645.30 req/sec │  ± 0.00 % │              + 310.47 % │

Probably it will be better to have one of those that already shared the benchmarks but is just to showcase that PR helped to improve the fetch by some margin.

mcollina · 2023-07-07T22:16:41Z

@RafaelGSS if you have time to refresh your analysis, folks could move to the next bottleneck.

RafaelGSS · 2023-07-22T18:37:03Z

Planing to do it soon!

mcollina · 2023-09-07T10:34:22Z

Here is some good news, fetch got even faster after Node v20.6.0.

In node 20.5.0, I got a nice 877.08 req/s out of fetch, while in v20.6.0 I got a 932 req/s in my benchmark, with a consistent and measurable 5% improvement.

Given all the latest improvements, we have been able to reduce the gap between fetch() and http.request() from 160% to 120% (data in #2250).

silverwind · 2023-09-07T10:39:40Z

Updated my benchmarks, it does look this small improvement is visible. bun and axios are still significantly faster, though.

mcollina · 2024-02-19T15:45:56Z

A few updates.

The latest benchmarks shows that, while axios is still faster, undici.fetch() now passes both node-fetch and got.

I also took the liberty of doing a flamegraph, and apparently the vast majority of the cost is due to the initialization of the whatwg streams.

I can see two ways to speed this up:

make them faster (maybe @RafaelGSS or @anonrig have ideas) - possibly we could have non-transferable WHATWG streams
add a Proxy (or analog) in between, so that we initialize the streams only if streams methods are needed and not for the body mixins (.json(), .text(), etc..)

@KhafraDev wdyt?

KhafraDev · 2024-02-19T19:00:57Z

Node specific options to web specs will always cause issues (usually with maintaining them), I don't think it's a good idea.
Streams are needed in the body mixins; the body mixin steps will call fully read body method which uses whatwg streams. Removing streams would make this harder to maintain. I did make an issue about short-circuiting the stream logic when doing something along the lines of new Response('string').text() which could also be applied.

There is another alternative, as @RafaelGSS experimented with earlier: move webstreams to node streams, when they're only used internally. As long as we can match the methods to look somewhat similar to webstreams (as in, _read = pull, etc., by extending the class) I wouldn't be against it. Honestly we could probably change response.body to a node stream, and convert to a webstream when accessed, which is very rarely.

I should probably do a better write up of how the body mixin methods work because there's a TON of optimizations we could do without infringing on maintainability.

mcollina · 2024-04-23T08:09:24Z

After more than two years, I'm extremely happy to confirm that we have achieved performance parity with node-fetch on Node.js main branch (v22):

┌─────────┬───────────────────────┬─────────┬────────────────────┬─────────────┬─────────────────────────┐
│ (index) │ Tests                 │ Samples │ Result             │ Tolerance   │ Difference with slowest │
├─────────┼───────────────────────┼─────────┼────────────────────┼─────────────┼─────────────────────────┤
│ 0       │ 'http - no keepalive' │ 101     │ '7247.97 req/sec'  │ '± 3.90 %'  │ '-'                     │
│ 1       │ 'got'                 │ 40      │ '9716.84 req/sec'  │ '± 2.86 %'  │ '+ 34.06 %'             │
│ 2       │ 'axios'               │ 35      │ '10088.40 req/sec' │ '± 2.68 %'  │ '+ 39.19 %'             │
│ 3       │ 'node-fetch'          │ 10      │ '10639.12 req/sec' │ '± 2.34 %'  │ '+ 46.79 %'             │
│ 4       │ 'request'             │ 30      │ '11108.52 req/sec' │ '± 2.76 %'  │ '+ 53.26 %'             │
│ 5       │ 'superagent'          │ 10      │ '14468.50 req/sec' │ '± 1.77 %'  │ '+ 99.62 %'             │
│ 6       │ 'undici - fetch'      │ 10      │ '15149.84 req/sec' │ '± 1.99 %'  │ '+ 109.02 %'            │
│ 7       │ 'undici - pipeline'   │ 101     │ '16352.78 req/sec' │ '± 10.98 %' │ '+ 125.62 %'            │
│ 8       │ 'http - keepalive'    │ 25      │ '16710.83 req/sec' │ '± 2.87 %'  │ '+ 130.56 %'            │
│ 9       │ 'undici - request'    │ 10      │ '26142.70 req/sec' │ '± 2.58 %'  │ '+ 260.69 %'            │
│ 10      │ 'undici - dispatch'   │ 25      │ '26817.31 req/sec' │ '± 2.70 %'  │ '+ 270.00 %'            │
│ 11      │ 'undici - stream'     │ 10      │ '27448.13 req/sec' │ '± 2.00 %'  │ '+ 278.70 %'            │
└─────────┴───────────────────────┴─────────┴────────────────────┴─────────────┴─────────────────────────┘

There is still some way to go to reach undici.request(), but I have a feeling this can be closed.

ignoramous added the enhancement New feature or request label Feb 4, 2022

mcollina changed the title ~~Performance parity~~ fetch() performance parity Mar 14, 2022

RafaelGSS mentioned this issue Mar 14, 2022

feat: benchmark stream web readable RafaelGSS/nodejs-bench-operations#3

Merged

jasnell mentioned this issue Mar 14, 2022

stream: expose web streams globals, remove runtime experimental warning nodejs/node#42225

Merged

github-actions bot mentioned this issue Apr 13, 2022

Upcoming Release Changes dotansimha/graphql-yoga#1078

Merged

devsnek mentioned this issue Apr 18, 2022

lib,src,v8: testing serializer changes nodejs/node#42777

Closed

HillLiu mentioned this issue Apr 19, 2022

[reshow-app] Trace build performance. react-atomic/reshow#3

Open

anonrig mentioned this issue Feb 21, 2023

Revert "stream: add abort signal for ReadableStream and WritableStream" nodejs/node#46756

Closed

KhafraDev mentioned this issue May 2, 2023

refactor(rest): switch api to fetch-like and provide strategies discordjs/discord.js#9416

Merged

BlackHole1 mentioned this issue May 6, 2023

Upgrade dependencies electron/update.electronjs.org#133

Merged

RafaelGSS mentioned this issue Jun 30, 2023

Performance of WHATWG ReadableStream.read() nodejs/performance#82

Open

mcollina mentioned this issue Jul 14, 2023

fix: fetch resource timing performance entry names should be strings #2188

Merged

H4ad mentioned this issue Apr 10, 2024

lib: improve perf of AbortSignal creation nodejs/node#52408

Closed

mcollina closed this as completed Apr 23, 2024

danielbankhead mentioned this issue Apr 25, 2024

feat!: fetch (POC) googleapis/gaxios#618

Draft

fetch() performance parity #1203

fetch() performance parity #1203

Comments

ignoramous commented Feb 4, 2022

This would solve...

The implementation should look like...

I have also considered...

Additional context

ronag commented Feb 4, 2022

mcollina commented Feb 5, 2022

ignoramous commented Feb 5, 2022 • edited

ignoramous commented Feb 6, 2022 • edited

ignoramous commented Feb 8, 2022 • edited

mcollina commented Feb 8, 2022

mcollina commented Feb 8, 2022

ignoramous commented Feb 8, 2022 • edited

mcollina commented Feb 8, 2022

mcollina commented Feb 9, 2022

ignoramous commented Feb 9, 2022

mcollina commented Feb 9, 2022

RafaelGSS commented Apr 17, 2022 • edited

TL;DR:

Additional Notes

ronag commented Apr 18, 2022

devsnek commented Apr 18, 2022 • edited

ronag commented Apr 18, 2022 • edited

RafaelGSS commented Apr 18, 2022

jasnell commented Apr 18, 2022 • edited

devsnek commented Apr 18, 2022 • edited

jasnell commented Apr 18, 2022

devsnek commented Apr 18, 2022

jasnell commented Apr 18, 2022

jasnell commented Apr 18, 2022 • edited

silverwind commented Jan 20, 2023

ronag commented Jan 20, 2023

BSN4 commented Jan 20, 2023

RafaelGSS commented Jan 20, 2023

BSN4 commented Jan 20, 2023

KhafraDev commented Feb 21, 2023

mcollina commented Feb 21, 2023

anonrig commented Feb 21, 2023

debadree25 commented Feb 22, 2023 • edited

ronag commented Feb 22, 2023

KhafraDev commented Feb 22, 2023

debadree25 commented Feb 22, 2023 • edited

debadree25 commented Feb 22, 2023

legendecas commented Jul 5, 2023

H4ad commented Jul 7, 2023 • edited

mcollina commented Jul 7, 2023

RafaelGSS commented Jul 22, 2023

mcollina commented Sep 7, 2023

silverwind commented Sep 7, 2023 • edited

mcollina commented Feb 19, 2024 • edited

KhafraDev commented Feb 19, 2024 • edited

mcollina commented Apr 23, 2024 • edited

ignoramous commented Feb 5, 2022 •

edited

ignoramous commented Feb 6, 2022 •

edited

ignoramous commented Feb 8, 2022 •

edited

ignoramous commented Feb 8, 2022 •

edited

RafaelGSS commented Apr 17, 2022 •

edited

devsnek commented Apr 18, 2022 •

edited

ronag commented Apr 18, 2022 •

edited

jasnell commented Apr 18, 2022 •

edited

devsnek commented Apr 18, 2022 •

edited

jasnell commented Apr 18, 2022 •

edited

debadree25 commented Feb 22, 2023 •

edited

debadree25 commented Feb 22, 2023 •

edited

H4ad commented Jul 7, 2023 •

edited

silverwind commented Sep 7, 2023 •

edited

mcollina commented Feb 19, 2024 •

edited

KhafraDev commented Feb 19, 2024 •

edited

mcollina commented Apr 23, 2024 •

edited