Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileHandle.stream #38350

Closed
ronag opened this issue Apr 22, 2021 · 19 comments
Closed

FileHandle.stream #38350

ronag opened this issue Apr 22, 2021 · 19 comments
Labels
feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors. stale

Comments

@ronag
Copy link
Member

ronag commented Apr 22, 2021

It might be possible to have a more efficient way to pipe a file to a stream by adding a .stream method to FileHandle.

e.g.

FileHandle.prototype.stream = async function (dst, { start = 0, end, signal } = {}) {
 // TODO: this[kRef](), this[kUnref](), if (signal.aborted) throw AbortError()
  let pos = start
  while (true) {
    if (end && pos >= end) {
      break
    }
  
    const { buffer, bytesRead } = await this.read({
      buffer: Buffer.allocUnsafe(Math.min(end - pos, 16384)),
      position: pos,
    })
  
    if (bytesRead === 0) {
      break
    }
  
    if (dst.writableNeedDrain) {
      await EE.once(res, 'drain', { signal })
    }
  
    if (dst.destroyed) {
      break // TODO: break or throw?
    }
  
    dst.write(buffer)
  
    pos += bytesRead
  }
}
const fileHandle = fsp.open(filePath)
try {
  await fileHandle.stream(dst)
} finally {
  await fileHandle.close()
}

vs

await pipeline(fs.createReadStream(filePath), dst)

Less ergonomic but potentially better performance. This idea needs benchmarking.

@ronag ronag added fs Issues and PRs related to the fs subsystem / file system. feature request Issues that request new features to be added to Node.js. good first issue Issues that are suitable for first-time contributors. labels Apr 22, 2021
@ronag
Copy link
Member Author

ronag commented Apr 22, 2021

Could also have a static alternative.

await fsp.stream(filePath, dst)

@ronag
Copy link
Member Author

ronag commented Apr 22, 2021

Another (less efficient) idea is to have a method that returns an async generator.

await pipeline(fsp.generator(filePath), dst)

@kuzmaMinin

This comment has been minimized.

@Ayase-252
Copy link
Member

Could I take this challenge? 😁

Ayase-252 added a commit to Ayase-252/node that referenced this issue Apr 22, 2021
Ayase-252 added a commit to Ayase-252/node that referenced this issue Apr 22, 2021
@benjamingr
Copy link
Member

Why a stream method rather than adding Symbol.asyncIterator? Is the goal perf or ergonomics?

@ronag
Copy link
Member Author

ronag commented Apr 22, 2021

Why a stream method rather than adding Symbol.asyncIterator? Is the goal perf or ergonomics?

Performance.

@jasnell
Copy link
Member

jasnell commented Apr 23, 2021

Just to expand on this a bit to add to the discussion. I don't have a timeline yet on exactly when I'd be able to continue working on this, but as part of the effort around enabling fetch (and a few other things), I've been looking at support for the web platform API standard Body mixin. My preference would be for whatever we do here to be aligned with that API.

So, for instance, let's assume that a FileHandle implemented the Body mixin:

const file = await fs.promises.open('file', 'rw');

file.body;            // A WHATWG ReadableStream
file.arrayBuffer();   // The content of the file as an ArrayBuffer
file.blob();          // The content of the file a Blob
file.formData();      // Doesn't really make sense here so probably good to omit
file.json();          // The content of the file as JSON
file.text();          // The content of the file as a string

Following this pattern, I'd suggest a file.readable() that returns a stream.Readable if the file is readable, and a file.writable() that returns a stream.Writable if the file is writable.

@benjamingr
Copy link
Member

benjamingr commented Apr 24, 2021

What about adding a Symbol.readable symbol like Symbol.asyncIterator that enables getting a readable stream version of stuff (like FileHandles)?

(Edit: obviously it'd be an import { ReadableSymbol } from 'stream' rather than monkey patching the global)

@ronag
Copy link
Member Author

ronag commented Apr 24, 2021

What about adding a Symbol.readable symbol like Symbol.asyncIterator that enables getting a readable stream version of stuff (like FileHandles)?

I would kind of like to move away from readables and towards async iterables. But maybe that’s a bigger discussion.

@benjamingr
Copy link
Member

@ronag

I would kind of like to move away from readables and towards async iterables. But maybe that’s a bigger discussion.

Deprecating streams in favour of async iterables as the contract would be ideal (one stream type, only pull, all modern promises with ease of debugging and syntax assist) eventually would be ideal - it would also make incorporating whatwg streams nicer - but IIRC performance really isn't there.

@ronag
Copy link
Member Author

ronag commented Apr 25, 2021

@ronag

I would kind of like to move away from readables and towards async iterables. But maybe that’s a bigger discussion.

Deprecating streams in favour of async iterables as the contract would be ideal (one stream type, only pull, all modern promises with ease of debugging and syntax assist) eventually would be ideal - it would also make incorporating whatwg streams nicer - but IIRC performance really isn't there.

I think it depends heavily on whether it’s sync or async streams.

Ayase-252 added a commit to Ayase-252/node that referenced this issue Apr 27, 2021
@addaleax
Copy link
Member

I love this comment in #38440:

Why is pipe()/pipeline() so slow?

Because there is an obvious answer to that: Because we're not using a consistent streams model on the native side.

One of the reasons I tried to push for StreamBase adoption while I was a full-time contributor -- and, relevant to this particular issue, made FileHandle a StreamBase on the native side -- is that that way, we can have a much faster piping mechanism by just skipping the JS data layer entirely, and only moving data on the C++ side of things. (This isn't even particularly complex to implement: addaleax@bcbf459. The only reason I haven't opened a PR with that back then is that it would only work for piping between readable FileHandles, sockets, and HTTP/2 streams, which didn't seem like a large enough set of use cases). And piping between native streams -- e.g. a file stream to zlib to HTTP to network or the other direction -- is a really common thing to do for Node.js users.

With suggestions like the implementation over in #38440, or the ones in #38233, we're moving further away from that, because we're introducing more different ways of creating streams that are ultimately backed by native calls, yet make data available in a format that is only compatible through the JS interfaces they use.

From a software design point of view, that's not good. Unfortunately, the way Node.js is developed, with step-by-step iteration without much coordinated planning, does not lend itself to long-term design processes that require effort from multiple people very well (and typing this out makes me wonder if the RFC process suggested by @Trott in #37953 wouldn't be a really good idea here).

So, @ronag, @jasnell, I know you have opinions on things like StreamBase, but please keep this in mind. If you really care about Node.js streams performance, then the way to do that is really a consistent API on the native side, regardless of what API that is. If StreamBase doesn't fit our use cases anymore, great -- but please replace it with an API that is better, and don't just leave parts of Node.js on one model and use a new model on other parts, and then use the fact that we are using one API across the board to make Node.js fast.

@jasnell
Copy link
Member

jasnell commented Apr 27, 2021

@addaleax, as we've discussed before, my only real concerns around StreamBase are the fact that it's extremely difficult to make any significant improvements to without breaking half of the Node.js ecosystem and the fact that it does not provide a way of doing the things I need it to do.

The QUIC implementation in #38233 does not move away from using StreamBase. It uses StreamBase directly. For instance, when doing something like stream.respondWith({ body: fs.promises.open('foo') }), the native layers will consume the FileHandle via the StreamBase API. The only difference is that there's another layer of internal buffering and incremental pulling of the buffered data that StreamBase simply does not implement any support whatsoever for. On the inbound data side, stream.readable() will be backed by a StreamBase to push the data out to the JavaScript side. The QUIC implementation is going to fully support the existing streams API -- it's just also going to support the ability to avoid using the streams API entirely.

As for #38440, I'm absolutely -1 on the way that implementation is done, regardless of the performance boost.

Ideally we would be able to make StreamBase faster and more flexible. Ideally we'd be able to do so in a way that doesn't break everyone. I'm not convinced yet that we can do both.

@addaleax
Copy link
Member

@jasnell Right, and I can totally see that and I don't mind that you're introducing something for QUIC that fits your needs better, I'm just saying that it makes sense to think about how the APIs around that are going to look like in the long run, so that we can also make QUIC even faster in the future.

@ronag
Copy link
Member Author

ronag commented Apr 28, 2021

Because there is an obvious answer to that: Because we're not using a consistent streams model on the native side.

Also because streams are slowish. Both solutions here live in JS land and one is faster than the other.

@ronag
Copy link
Member Author

ronag commented Apr 28, 2021

I'm fine with not doing this. Was just an idea on how get a little better perf with FileHandle (I'm doing it like this in our perf sensitive code). In an ergonomic sense I would be fine with having FileHandle.readable() and/or FileHandle[Symbol.AsyncIteartor]().

@jimmywarting
Copy link

jimmywarting commented Sep 8, 2021

little late to the party... but how about something like

const file = await fs.getFile('./readme.md')
// file instanceof window.File
file.stream() // new whatwg:stream ReadableStream()
file.text() // Promise<string>
file.arrayBuffer() // Promise<ArrayBuffer>
await new Response(file).json()
await new Response(file).formData()

@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2022

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Apr 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2022

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot closed this as completed May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors. stale
Projects
Development

Successfully merging a pull request may close this issue.

8 participants
@jasnell @addaleax @jimmywarting @benjamingr @ronag @Ayase-252 @kuzmaMinin and others