Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob.text and Blob.arraybuffer are 10x+ slower on 20.x #118

Open
H4ad opened this issue Sep 21, 2023 · 15 comments
Open

Blob.text and Blob.arraybuffer are 10x+ slower on 20.x #118

H4ad opened this issue Sep 21, 2023 · 15 comments

Comments

@H4ad
Copy link
Member

H4ad commented Sep 21, 2023

The blob/blob.js benchmark is taking hours: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1402/console

This probably didn't take so long in the past, there is this PR that changed the Blob's behavior a lot: nodejs/node#45258

From what I saw, the problem is in the JS->C++ transitions, for each piece of data it crosses JS->C++.

Furthermore, the text function expects arraybuffer and then calls decode, when we could just decode the data as soon as we get it from the C++ instead of keeping everything in memory and then discarding it.

I can do some optimizations on the JS side but I think this could be largely improved if we do the heavy operations on C++, I have no idea how to handle the asynchronous code on C++, so I accept some hints or if anyone wants to try work on this.

@H4ad
Copy link
Member Author

H4ad commented Sep 21, 2023

Some benchmark data:

NodeJS 16.20.1

name ops/sec samples
text (128) 17,269 57
text (1024) 10,257 57
arrayBuffer (128) 24,974 53
arrayBuffer (1024) 10,955 57
slice (0, 64) 48,310 67
slice (0, 512) 23,750 66

NodeJS 18.17.0

name ops/sec samples
text (128) 36,176 74
text (1024) 13,691 69
arrayBuffer (128) 36,788 74
arrayBuffer (1024) 14,305 75
slice (0, 64) 49,139 71
slice (0, 512) 26,774 76

NodeJS 20.5.0

name ops/sec samples
text (128) 3,194 74
text (1024) 380 76
arrayBuffer (128) 3,478 80
arrayBuffer (1024) 407 75
slice (0, 64) 45,076 66
slice (0, 512) 16,484 64

@H4ad H4ad changed the title Blob can be faster Blob.text and Blob.arraybuffer are 10x+ slower on 20.x Sep 21, 2023
@benjamingr
Copy link
Member

@jasnell @flakey5

@benjamingr
Copy link
Member

@H4ad run a git bisect to find where the regression is?

@H4ad
Copy link
Member Author

H4ad commented Sep 22, 2023

@benjamingr I will!

Also, this regression was so hard to run this piece of code:

const buff = Buffer.allocUnsafe(1024 ** 2);
const source = new File(buff, 'dummy.txt', options);

const now = performance.now();
source.text().then(() => console.log(`diff: ${performance.now() - now}`));

It took 3s on node 20.x and 80ms on 18.x.

@rluvaton
Copy link
Member

Which public api affected by this?

@H4ad
Copy link
Member Author

H4ad commented Sep 22, 2023

If I understand your question, File is from Buffer which inherits Blob, so any code that inherits and uses Blob will be slowdown when calling the text or arraybuffer.

From what I read, undici uses File and Blob to handle multipart/form-data and files, so is very likely they had this regression too if they call .text.

@jasnell
Copy link
Member

jasnell commented Sep 22, 2023

This will be a tricky one as the new mechanism underlying the C++ implementation of Blob is also going to be used for the QUIC implementation that I'm currently working on. It also serves as the underlying bit for file-backed Blob. There is lots of room for performance improvement here but I would appreciate this not being changed right at this moment as it could lead to conflicts with the ongoing QUIC implementation work. What would probably be the most valuable right now would be just identifying where the key bottlenecks -- identifying those so that we can circle back around to making those pieces faster.

@rluvaton
Copy link
Member

rluvaton commented Sep 23, 2023

In the QUIC implementation, are you changing the Blob?

@debadree25
Copy link
Member

debadree25 commented Sep 26, 2023

Making an initial attempt here nodejs/node#49873 just stuck on how to make the pulling part async similar to how we are doing queueMicrotask in js in cpp, Is there maybe some sort of way to queue jobs in c++ part 🤔

@rluvaton
Copy link
Member

rluvaton commented Sep 26, 2023

so I kinda did git bisect and I suspect it was either created by:

or created by those:

@debadree25
Copy link
Member

Hmm the main problem seems like we pass callback -> callback again passes back value after reading -> repeat 2 JS -> C++ and back calls everytime data being read

@joyeecheung
Copy link
Member

just stuck on how to make the pulling part async similar to how we are doing queueMicrotask in js in cpp, Is there maybe some sort of way to queue jobs in c++ part

Pretty sure you were looking for v8::Isolate::EnqueueMicrotask

@debadree25
Copy link
Member

just stuck on how to make the pulling part async similar to how we are doing queueMicrotask in js in cpp, Is there maybe some sort of way to queue jobs in c++ part

Pretty sure you were looking for v8::Isolate::EnqueueMicrotask

ahah yes! would do the job i guess but bigger problem rn is how to queue them described in nodejs/node#49873 (comment)

@H4ad
Copy link
Member Author

H4ad commented Jan 31, 2024

@jasnell how is going the QUIC Implementation? Do you think we can go back and look for ways to optimize or do you think we should wait until we have more updates on the QUIC implementation?

@debadree25
Copy link
Member

Made some progress on this in nodejs/node#49873 but stuck on some C++ pointer stuff if anyone could help or take a look would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants