Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a buffer pool for fs.ReadStream to boost performance? #14136

Closed
venkatperi opened this issue Jul 8, 2017 · 8 comments
Closed

Use a buffer pool for fs.ReadStream to boost performance? #14136

venkatperi opened this issue Jul 8, 2017 · 8 comments
Labels
buffer Issues and PRs related to the buffer subsystem. fs Issues and PRs related to the fs subsystem / file system.

Comments

@venkatperi
Copy link

venkatperi commented Jul 8, 2017

  • Version:
  • Platform:
  • Subsystem:

REQUEST: Would you consider moving to a pool scheme since it'll boost performance without changing the API?

fs.ReadStream allocates 64KiB (default, change via highWaterMark) sized Buffer objects as it reads through files which are released downstream when no longer needed. With larger files, this can result in a lot of time spent allocating Buffer objects only to have them picked up in GC, which in by itself can add up to a lot.

Reducing allocations with a buffer pool should boost performance across the board with respect to file size without foreseeable changes to the public API.

See post for more details.

Performance vs File Size

Higher is better
81

Percentage Boost in Performance

Average 67% improvement across all sizes
83

@vsemozhetbyt vsemozhetbyt added buffer Issues and PRs related to the buffer subsystem. fs Issues and PRs related to the fs subsystem / file system. labels Jul 8, 2017
@mscdex
Copy link
Contributor

mscdex commented Jul 8, 2017

I'm not sure what the question/request is here, but you can pass your own highWaterMark to fs.createReadStream(), which controls the size of allocated Buffer pools for fs.ReadStream.

@venkatperi
Copy link
Author

The issue isn't the size of the buffer (yes, its size can be changed by setting the highWaterMark) , rather the cost of allocation/release of buffers.

@TimothyGu
Copy link
Member

TimothyGu commented Jul 10, 2017

Unfortunately, while this test may yield promising results in an in vitro setting, actual adoption of pooling is much more difficult. It would require, on the consumer's part, to inform the core when the consumer is done with a specific buffer. This not only detracts from the JS paradigm of relying on GC, but can also reduce the overall security of the platform, as refcounting can be very difficult to get right in more complex projects. Plus, from the perspective of compatibility, it would require an opt-in, which diminishes gains in actual deployment.

To demonstrate how easily it is to get refcounting wrong, the post has:

Writable.prototype.write = function ( chunk, encoding, cb ) {
  ...
  if ( !(chunk && typeof chunk._unref === 'function') ) {
    return this._unref();
  }
  ...
 }

This simply does not work, as many asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed. There have in fact been issues filed against Node.js (that I am unable to locate right now) about this behavior, but it is resolved that the status quo provides the best performance and is thus kept.

@jorangreef
Copy link
Contributor

jorangreef commented Jul 21, 2017

I think this issue was prematurely closed and should be reopened. Node would benefit from more discussion on the subject.

This [...] detracts from the JS paradigm of relying on GC

The question of GC vs static allocation is not a binary decision, and there are ways to design APIs which avoid the problems @TimothyGu describes. Javascript itself does not force anyone to "rely on GC" and Node already provides for static allocation when it comes to file system read() and write() operations. If anything, the long-term trend in Node's API design has been more and more towards providing APIs which accept not just buffers, but also offsets into buffers to avoid pressuring the GC through needless allocations.

If some aspects of Node's APIs currently do not support pulling data into a user-provided buffer (which would facilitate static allocation), then perhaps this is more an indication of a problem in the design of the API in question, then with the principle of static allocation.

One can definitely write Javascript to use static allocation for the heavy "data plane" work, and to use GC for the light "control plane" work, and I think Node needs to move in this direction to remain relevant. It doesn't have to be xor.

Encouraging users to rely on GC for data plane work makes no sense from a throughput point of view, and is going to lead to more issues relating to memory fragmentation and allocator choices (see #6078, #12805 and #8871 for some recent examples).

@jorangreef
Copy link
Contributor

See #6923 for raw numbers to convince.

@TimothyGu
Copy link
Member

@jorangreef My objections were based on OP's approach for buffer pulling. I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue) – which I agree can be very helpful for throughout when done right – and if you can offer some implementable proposals it would be even better. It would also offer a path to reconcile with Web Streams API's BYOB (bring your own buffer) readers in the future, if that happens.

@TimothyGu
Copy link
Member

@jorangreef I just reopened #3403.

@jorangreef
Copy link
Contributor

I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue)
...
@jorangreef I just reopened #3403.

Thanks very much @TimothyGu , #3403 is the issue for pulling into user-controlled buffers, appreciate your reopening that.

asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed

I agree with you on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

No branches or pull requests

5 participants