Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making blob.stream() more stream friendly #42264

Closed
jimmywarting opened this issue Mar 9, 2022 · 5 comments
Closed

Making blob.stream() more stream friendly #42264

jimmywarting opened this issue Mar 9, 2022 · 5 comments
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@jimmywarting
Copy link

Version

17.5

Platform

mac

Subsystem

No response

What steps will reproduce the bug?

const blob = new Blob(data)

// Sometime later
blob.stream()

How often does it reproduce? Is there a required condition?

it reproduce every time, it's not clear to everyone that stream() will internally use blob.arrayBuffer()

What is the expected behavior?

to yield small chunks of data to not allocate as much memory as blob.size

What do you see instead?

I see memory spikes that when i create large blobs and then later tries to read them that memory goes way up.
this is one of the reason we are sticking to using fetch-blob instead of node's own built in blob implementation instead. I wish i did not have to use this package anymore and that i could also take adv of transfering blob's over workers with the built in transferable blob.

Additional information

No response

@meixg meixg added the buffer Issues and PRs related to the buffer subsystem. label Mar 9, 2022
@meixg
Copy link
Member

meixg commented Mar 10, 2022

later tries to read them that memory goes way up

There used to be an array copy while reading blob stream, maybe this is the reason. Can you try on the master branch?


There is still a full copy when creating a stream though.

@jimmywarting
Copy link
Author

Here is where it reads the hole blob into arrayBuffer before it starts streaming the data in chunks

node/lib/internal/blob.js

Lines 325 to 328 in a7164fd

async start() {
this[kState] = await self.arrayBuffer();
this[kIndex] = 0;
},

@jimmywarting
Copy link
Author

the pull should rather try to fetch the needed chunked arrayBuffer()

maybe do something like this psudo code?

pull(ctrl) {
  slice = await this.slice(start, end).arrayBuffer()
  ctrl.enqueue(slice)
}

@jimmywarting
Copy link
Author

jimmywarting commented Oct 27, 2022

it's somewhat unexpected that all chunks use the same underlying buffer and that each chunk has a byte offset.

chunks[0].buffer === chunks[1].buffer

// Should rather be false
chunks[0].buffer !== chunks[1].buffer
chunks[0].byteOffset === 0

@jasnell
Copy link
Member

jasnell commented Oct 27, 2022

I think the work that @flakey5 is looking to do in #45188 should address this.

The initial implementation of Blob was meant to be as simple as possible to get something working. The intent was always to revisit it to implement performance optimizations once it was in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants