Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream example #34

Closed
wants to merge 4 commits into from
Closed

Add stream example #34

wants to merge 4 commits into from

Conversation

jasnell
Copy link
Collaborator

@jasnell jasnell commented May 4, 2020

@addaleax @mcollina ... here's a super simple example of how streams can be supported.

@addaleax ... interestingly, this example hits an Abort inside Node.js when we replace createBrotliCompress with createGzip or createDeflate within the worker.js...

james@ubuntu:~/nearform/piscina/examples/stream$ node index
node[30456]: ../src/node_zlib.cc:323:static void node::{anonymous}::CompressionStream<CompressionContext>::Write(const v8::FunctionCallbackInfo<v8::Value>&) [with bool async = true; CompressionContext = node::{anonymous}::ZlibContext]: Assertion `Buffer::IsWithinBounds(out_off, out_len, Buffer::Length(out_buf))' failed.
 1: 0xa295b0 node::Abort() [node]
 2: 0xa2962e  [node]
 3: 0xae6b1a  [node]
 4: 0xc0251b  [node]
 5: 0xc03ac6  [node]
 6: 0xc04146 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
2
 7: 0x13a5919  [node]
Aborted (core dumped)

The reason, for the abort, as far as I can see, is the fact that the PortDuplex transfers ownership of the buffer chunk over to the main thread using a transferList, which highlights a significant grey areas with the Streams API ... when using a Writable, and passing a Buffer, who takes ownership responsibility for that Buffer? The caller or the receiver? I don't think we have a clear answer for that but we definitely need a fix that does not trigger an Abort.

@jasnell
Copy link
Collaborator Author

jasnell commented May 4, 2020

Refs: nodejs/node#33240

@jasnell
Copy link
Collaborator Author

jasnell commented May 4, 2020

I've updated the example to disable the transfer list in the custom duplex by default to avoid the Abort

Copy link
Collaborator

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM but I’m not really sure about this example as it exemplifies several things that are not best practices, including using Workers for file I/O and compression, both of which are already asynchronous operations in Node.js.

If you want a more real-world example, maybe a simple text-processing Worker (e.g. applying a complex regexp to a stream – even just filtering out valid JS identifiers is quite heavy if you want to do it correctly) would be better?

_read () {
// Do nothing here. A more complete example would
// implement proper read/pause behavior.
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I’ve thought about porting the code from our internal Worker stdio implementation to npm. This is definitely giving more motivation for that :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely would be good I think. I'm considering the possibility of Piscina providing a more complete version of PortDuplex out of the box.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell The thing is, in the end streaming use cases for Workers are even rarer than use cases for Workers in general. I don’t think we should encourage this pattern if users don’t really need it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's absolutely fair. The key reason I'm thinking through this is to explore the various good and bad ways of using the workers and what the various caveats are.

@jasnell
Copy link
Collaborator Author

jasnell commented May 4, 2020

... it exemplifies several things that are not best practices, including using Workers for file I/O and compression

Yep, this is actually a lead in to some react server side rendering tests I'm looking at but wanted to make sure the basic idea for the PortDuplex would be sound.

@jasnell
Copy link
Collaborator Author

jasnell commented May 4, 2020

(to be clear, I'm not in a rush to land this particular example, we can keep tweaking it until it's demonstrating the right thing)

@addaleax
Copy link
Collaborator

addaleax commented May 4, 2020

@jasnell I’ll try to port over the stream code from core, and I’ll try to look into performance improvements there – theoretically, the same Atomics improvements we do for task posting should also make sense for streams, and probably even more so there.

if (typeof chunk === 'string') {
chunk = Buffer.from(chunk, encoding);
}
const transferList = this.#transfer ? [chunk.buffer] : undefined;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buffer was allocated with allocUnsafe this will cause problems due to pooling no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are many ways this is problematic. I'll be updating this such that we always create a new Uint8Array copy of the Buffer then transferring that new copy. That causes the fewest headaches overall.

// transferList. There are several reasons:
// a) Buffer instances are most often created off a pool
// and share the same underlying common ArrayBuffer,
// transferring those can break Node.js in many ways.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in the Node.js thread, this is true but not necessarily a concern in the future. The sceond concern is definitely there, though :)

Copy link
Collaborator

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example is complicated enough to either live in its own module or be a part of this one.

In case I would recommend the name https://www.npmjs.com/package/scivolo (water slide).

// TODO(@jasnell): A more complete example would
// handle this error more appropriately.
this.#port.close();
console.error(err);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should call the callback here.

_write (chunk, encoding, callback) {
if (typeof chunk === 'string') {
chunk = Buffer.from(chunk, encoding);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be a buffer, you are not allowing the encoding to be changed.

// will cause unexpected and undefined behavior that
// will likely crash the Node.js process.
this.#port.postMessage(chunk);
callback();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling there should be some sort of backpressure built in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely it should

@jasnell
Copy link
Collaborator Author

jasnell commented May 5, 2020

I think this example is complicated enough to either live in its own module or be a part of this one.

I agree. It should definitely live in it's own module and we should be very prescriptive about how and why to use it. @addaleax, I know you were looking at porting the code we use in core to wrap stdio for workers, let's bundle that into the new project. I'll get that repo started by later tonight.

@jasnell jasnell marked this pull request as draft May 5, 2020 20:54
@jasnell
Copy link
Collaborator Author

jasnell commented May 5, 2020

Converted this PR to a draft so it doesn't get landed. Will be iterating on this further.

@jasnell
Copy link
Collaborator Author

jasnell commented May 8, 2020

This is going to be revisited later.

@jasnell jasnell closed this May 8, 2020
@metcoder95 metcoder95 deleted the stream-example branch September 17, 2023 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants