Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: add asString static method #45408

Closed
wants to merge 1 commit into from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Nov 10, 2022

My microbenchmarks show 33% improvement for the following microbenchmark. (Fixes nodejs/performance#3)

cpu: Apple M1 Max
runtime: node v20.0.0-pre (arm64-darwin)

benchmark                    time (avg)             (min … max)       p75       p99      p995
--------------------------------------------------------------- -----------------------------
new Buffer().toString()    1.65 µs/iter     (1.41 µs … 1.95 µs)   1.78 µs   1.95 µs   1.95 µs
Buffer.asString()          1.07 µs/iter        (1 µs … 1.21 µs)   1.09 µs   1.21 µs   1.21 µs
TextDecoder.decode()       8.67 µs/iter     (8.55 µs … 8.78 µs)    8.7 µs   8.78 µs   8.78 µs

Benchmark:

import { bench, run } from "mitata"
import { Buffer } from "node:buffer"

const encoder = new TextEncoder()
const decoded = 'hello world'.repeat(1000)
const encoded = encoder.encode(decoded)

bench("new Buffer().toString()", () => Buffer.from(encoded).toString())
bench("Buffer.asString()", () => Buffer.asString(encoded, 'utf8'))
bench("TextDecoder.decode()", () => (new TextDecoder()).decode(encoded))

await run()

CC @nodejs/buffer

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Nov 10, 2022
@anonrig anonrig force-pushed the api/buffer-asstring branch 2 times, most recently from eb39f40 to 4907312 Compare November 10, 2022 15:15
@anonrig anonrig added the performance Issues and PRs related to the performance of Node.js. label Nov 10, 2022
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why new TextDecoder().decode() couldn’t operate at a similar performance?

I would advise against adding new APIs to Buffer, given that its functionality should at this point be fully covered by standarized Web APIs like Uint8Array and TextEncoder/TextDecoder.

@addaleax addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label Nov 10, 2022
@anonrig
Copy link
Member Author

anonrig commented Nov 10, 2022

@addaleax TextDecoder creates string_decoder, and in terms of JS/C++ it's a lot costly. To be exact it's 8 times slower than Buffer.asString. I've updated my benchmark.

@addaleax
Copy link
Member

@anonrig Exactly! I’m saying it shouldn’t be doing that (if it’s about UTF-8 with default options). I’m suggesting to optimize the standardized API, rather than introducing a new, non-standard one (to what is effectively a legacy API).

@RReverser
Copy link
Member

@addaleax TextDecoder creates string_decoder, and in terms of JS/C++ it's a lot costly. To be exact it's 8 times slower than Buffer.asString. I've updated my benchmark.

In your benchmark you don't reuse the TextDecoder instance, which is not how it's usually used. Most projects create a TextDecoder with the desired conversion options once, and then reuse throughout the app.

@anonrig
Copy link
Member Author

anonrig commented Nov 10, 2022

@RReverser This is not the case for undici. This is why I had to open this pull request.

@RReverser
Copy link
Member

RReverser commented Nov 10, 2022

@RReverser This is not the case for undici. This is why I had to open this pull request.

Right, but it sounds like a problem that undici could solve easily within the project instead of adding a new API to Node.js itself just to fix it.

@anonrig
Copy link
Member Author

anonrig commented Nov 10, 2022

@RReverser This is not the case for undici. This is why I had to open this pull request.

Right, but it sounds like a problem that undici could solve easily within the project instead of adding a new API to Node.js itself just to fix it.

I was feeling the same way before @mcollina's review: nodejs/undici#1768 (comment)

@tniessen
Copy link
Member

TextDecoder is not necessarily comparable to the proposed API here because the newly proposed API does not allow decoding text in a streaming fashion. The statefulness of TextDecoder is what seems to prevent reuse in undici.

@RReverser
Copy link
Member

The statefulness of TextDecoder is what seems to prevent reuse in undici.

It's only stateful if you use the stream option, but from the snippet above it doesn't seem they do that? (since they create new TextDecoder instance for each .decode(string) anyway).

@RReverser
Copy link
Member

I was feeling the same way before @mcollina's review: nodejs/undici#1768 (comment)

Hmm, I'll ask on that thread.

@RReverser
Copy link
Member

RReverser commented Nov 10, 2022

Right, I see they use streaming decode in one place actually: https://github.com/anonrig/undici/blob/0637fd97174402ddaa36905d7470f8d2072ec1fb/lib/fetch/body.js#L412

But then, for that static Buffer.asString wouldn't help either, since it's also not stateful and, as such, won't handle streaming correctly.

@mcollina
Copy link
Member

This won't help undici unfortunately, we need streaming parsing.

@jimmywarting
Copy link

I’m suggesting to optimize the standardized API, rather than introducing a new, non-standard one (to what is effectively a legacy API).

I would advise against adding new APIs to Buffer, given that its functionality should at this point be fully covered by standarized Web APIs like Uint8Array and TextEncoder/TextDecoder.

👍 Agree, Should improve standard apis (TextEncoder/TextDecoder) instead.
There is also this issue talking about using uint8array instead of buffer #41588 in NodeJS

speaking for my self i discourage any use of Buffer as it's not cross env friendly. It increase the bundle and slows browser down. ( for instance in browsers buffer.toString() is waaay slower than using TextDecoder: source: #39301 )

If new features are being added then ppl will continue to use the legacy Buffer. And more packages is just keep being bloated with the hole browserified buffer module while only using 1% of the Buffer class.

The browser package of the Buffer module haven't been updated much lately. not for 2y and keeps lagging behind in new features.

I'm starting to see new lint rules that starts to forbid the use of Buffer and suggest to use Uint8Array/Text(de)coder/DataView instead.

As such i'm 👎 for this PR

@anonrig
Copy link
Member Author

anonrig commented Nov 10, 2022

Thank you for all of the reviews, and responses. It's really great to hear some amazing feedback and experiences about encoding. I opened a new pull request to improve and add a fast path for UTF-8 with @addaleax. Feel free to review the changes in there, since fast path makes this pull request obsolete.

PR: #45412

@anonrig anonrig closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uint8Array to UTF8 conversion
7 participants