buffer: add asString static method #45408

anonrig · 2022-11-10T14:59:17Z

My microbenchmarks show 33% improvement for the following microbenchmark. (Fixes nodejs/performance#3)

cpu: Apple M1 Max
runtime: node v20.0.0-pre (arm64-darwin)

benchmark                    time (avg)             (min … max)       p75       p99      p995
--------------------------------------------------------------- -----------------------------
new Buffer().toString()    1.65 µs/iter     (1.41 µs … 1.95 µs)   1.78 µs   1.95 µs   1.95 µs
Buffer.asString()          1.07 µs/iter        (1 µs … 1.21 µs)   1.09 µs   1.21 µs   1.21 µs
TextDecoder.decode()       8.67 µs/iter     (8.55 µs … 8.78 µs)    8.7 µs   8.78 µs   8.78 µs

Benchmark:

import { bench, run } from "mitata"
import { Buffer } from "node:buffer"

const encoder = new TextEncoder()
const decoded = 'hello world'.repeat(1000)
const encoded = encoder.encode(decoded)

bench("new Buffer().toString()", () => Buffer.from(encoded).toString())
bench("Buffer.asString()", () => Buffer.asString(encoded, 'utf8'))
bench("TextDecoder.decode()", () => (new TextDecoder()).decode(encoded))

await run()

CC @nodejs/buffer

addaleax

Is there a reason why new TextDecoder().decode() couldn’t operate at a similar performance?

I would advise against adding new APIs to Buffer, given that its functionality should at this point be fully covered by standarized Web APIs like Uint8Array and TextEncoder/TextDecoder.

anonrig · 2022-11-10T15:36:02Z

@addaleax TextDecoder creates string_decoder, and in terms of JS/C++ it's a lot costly. To be exact it's 8 times slower than Buffer.asString. I've updated my benchmark.

addaleax · 2022-11-10T15:42:38Z

@anonrig Exactly! I’m saying it shouldn’t be doing that (if it’s about UTF-8 with default options). I’m suggesting to optimize the standardized API, rather than introducing a new, non-standard one (to what is effectively a legacy API).

RReverser · 2022-11-10T16:00:31Z

@addaleax TextDecoder creates string_decoder, and in terms of JS/C++ it's a lot costly. To be exact it's 8 times slower than Buffer.asString. I've updated my benchmark.

In your benchmark you don't reuse the TextDecoder instance, which is not how it's usually used. Most projects create a TextDecoder with the desired conversion options once, and then reuse throughout the app.

anonrig · 2022-11-10T16:36:56Z

@RReverser This is not the case for undici. This is why I had to open this pull request.

RReverser · 2022-11-10T17:05:46Z

@RReverser This is not the case for undici. This is why I had to open this pull request.

Right, but it sounds like a problem that undici could solve easily within the project instead of adding a new API to Node.js itself just to fix it.

anonrig · 2022-11-10T17:09:53Z

@RReverser This is not the case for undici. This is why I had to open this pull request.

Right, but it sounds like a problem that undici could solve easily within the project instead of adding a new API to Node.js itself just to fix it.

I was feeling the same way before @mcollina's review: nodejs/undici#1768 (comment)

tniessen · 2022-11-10T17:30:57Z

TextDecoder is not necessarily comparable to the proposed API here because the newly proposed API does not allow decoding text in a streaming fashion. The statefulness of TextDecoder is what seems to prevent reuse in undici.

RReverser · 2022-11-10T17:35:27Z

The statefulness of TextDecoder is what seems to prevent reuse in undici.

It's only stateful if you use the stream option, but from the snippet above it doesn't seem they do that? (since they create new TextDecoder instance for each .decode(string) anyway).

RReverser · 2022-11-10T17:35:57Z

I was feeling the same way before @mcollina's review: nodejs/undici#1768 (comment)

Hmm, I'll ask on that thread.

RReverser · 2022-11-10T17:40:12Z

Right, I see they use streaming decode in one place actually: https://github.com/anonrig/undici/blob/0637fd97174402ddaa36905d7470f8d2072ec1fb/lib/fetch/body.js#L412

But then, for that static Buffer.asString wouldn't help either, since it's also not stateful and, as such, won't handle streaming correctly.

mcollina · 2022-11-10T18:44:39Z

This won't help undici unfortunately, we need streaming parsing.

jimmywarting · 2022-11-10T21:44:53Z

I’m suggesting to optimize the standardized API, rather than introducing a new, non-standard one (to what is effectively a legacy API).

I would advise against adding new APIs to Buffer, given that its functionality should at this point be fully covered by standarized Web APIs like Uint8Array and TextEncoder/TextDecoder.

👍 Agree, Should improve standard apis (TextEncoder/TextDecoder) instead.
There is also this issue talking about using uint8array instead of buffer #41588 in NodeJS

speaking for my self i discourage any use of Buffer as it's not cross env friendly. It increase the bundle and slows browser down. ( for instance in browsers buffer.toString() is waaay slower than using TextDecoder: source: #39301 )

If new features are being added then ppl will continue to use the legacy Buffer. And more packages is just keep being bloated with the hole browserified buffer module while only using 1% of the Buffer class.

The browser package of the Buffer module haven't been updated much lately. not for 2y and keeps lagging behind in new features.

I'm starting to see new lint rules that starts to forbid the use of Buffer and suggest to use Uint8Array/Text(de)coder/DataView instead.

As such i'm 👎 for this PR

anonrig · 2022-11-10T21:50:49Z

Thank you for all of the reviews, and responses. It's really great to hear some amazing feedback and experiences about encoding. I opened a new pull request to improve and add a fast path for UTF-8 with @addaleax. Feel free to review the changes in there, since fast path makes this pull request obsolete.

PR: #45412

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Nov 10, 2022

anonrig force-pushed the api/buffer-asstring branch 2 times, most recently from eb39f40 to 4907312 Compare November 10, 2022 15:15

anonrig added the performance Issues and PRs related to the performance of Node.js. label Nov 10, 2022

buffer: add asString static method

569de25

anonrig force-pushed the api/buffer-asstring branch from 4907312 to 569de25 Compare November 10, 2022 15:27

addaleax requested changes Nov 10, 2022

View reviewed changes

addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label Nov 10, 2022

anonrig closed this Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer: add asString static method #45408

buffer: add asString static method #45408

anonrig commented Nov 10, 2022 •

edited

addaleax left a comment

anonrig commented Nov 10, 2022 •

edited

addaleax commented Nov 10, 2022

RReverser commented Nov 10, 2022

anonrig commented Nov 10, 2022

RReverser commented Nov 10, 2022 •

edited

anonrig commented Nov 10, 2022

tniessen commented Nov 10, 2022

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022 •

edited

mcollina commented Nov 10, 2022

jimmywarting commented Nov 10, 2022

anonrig commented Nov 10, 2022 •

edited

buffer: add asString static method #45408

buffer: add asString static method #45408

Conversation

anonrig commented Nov 10, 2022 • edited

addaleax left a comment

Choose a reason for hiding this comment

anonrig commented Nov 10, 2022 • edited

addaleax commented Nov 10, 2022

RReverser commented Nov 10, 2022

anonrig commented Nov 10, 2022

RReverser commented Nov 10, 2022 • edited

anonrig commented Nov 10, 2022

tniessen commented Nov 10, 2022

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022 • edited

mcollina commented Nov 10, 2022

jimmywarting commented Nov 10, 2022

anonrig commented Nov 10, 2022 • edited

anonrig commented Nov 10, 2022 •

edited

anonrig commented Nov 10, 2022 •

edited

RReverser commented Nov 10, 2022 •

edited

RReverser commented Nov 10, 2022 •

edited

anonrig commented Nov 10, 2022 •

edited