Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ReadableStream "owning" type #1271

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

youennf
Copy link
Contributor

@youennf youennf commented Apr 9, 2023

Implement part of streams-for-raw-video-explainer.md.

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

Copy link
Collaborator

@ricea ricea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach looks good. I made some early nitpicks.

We can bikeshed whether type: 'transfer' or a new optional flag transfer: true is better. type: 'transfer' makes it easier to extend TransformStream. transfer: true maintains the 1:1 relationship between types and controllers.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@youennf
Copy link
Contributor Author

youennf commented Apr 11, 2023

We can bikeshed whether type: 'transfer' or a new optional flag transfer: true is better.

I went with type: 'transfer' as transfer does not make sense for byob streams.
I guess a separate flag might be useful if we anticipate other types where transfer:true would be useful.

@youennf
Copy link
Contributor Author

youennf commented Apr 11, 2023

A few thoughts:

  • The current algorithm tries to transfer/serialize only if it is sure to be able to enqueue. This seems a good property to me.
  • If transfer/serialization fails, we silently terminate without either throwing or erroring the stream. I wonder whether this will be surprising to web developers.
  • Cloning a transfer:true stream with transfer-only objects would succeed but the second branch would be errored. This seems ok to me.
  • The current transfer-or-clone algorithm has surprising and potentially undesired effects:
    • enqueue(videoFrame) would make sure to transfer videoFrame but enqueue({ videoFrame }) would clone it.
    • We could introduce a second optional parameter to controller.enqueue, something like enqueue(any message, sequence<object> transfer = []) and update accordingly the StructuredTransferOrClone algorithm. This would make it closer to postMessage. In that case, should we keep enqueue(videoFrame) as a shortcut to enqueue(videoFrame, [videoFrame])?

@ricea
Copy link
Collaborator

ricea commented Apr 12, 2023

I went with type: 'transfer' as transfer does not make sense for byob streams.

That makes sense to me.

I guess a separate flag might be useful if we anticipate other types where transfer:true would be useful.

I'm hoping we won't have to add any more controller types, since they are costly in terms of specification and implementation size.

@ricea
Copy link
Collaborator

ricea commented Apr 12, 2023

  • The current algorithm tries to transfer/serialize only if it is sure to be able to enqueue. This seems a good property to me.

Agreed.

  • If transfer/serialization fails, we silently terminate without either throwing or erroring the stream. I wonder whether this will be surprising to web developers.

I would lean towards erroring the stream. Silently losing data is something I want to avoid.

  • Cloning a transfer:true stream with transfer-only objects would succeed but the second branch would be errored. This seems ok to me.

It makes me a bit uncomfortable but I think it's acceptable.

  • The current transfer-or-clone algorithm has surprising and potentially undesired effects:

    • enqueue(videoFrame) would make sure to transfer videoFrame but enqueue({ videoFrame }) would clone it.
    • We could introduce a second optional parameter to controller.enqueue, something like enqueue(any message, sequence<object> transfer = []) and update accordingly the StructuredTransferOrClone algorithm. This would make it closer to postMessage. In that case, should we keep enqueue(videoFrame) as a shortcut to enqueue(videoFrame, [videoFrame])?

If we add two-argument enqueue() we should also add two-argument write(). I would like the two-argument form to work even when not using type: transfer as it is extremely useful in combination with transferable streams. When I was thinking about this previously I was concerned about the extra complexity it adds to pipeTo(), but maybe it's really not that bad now that we use ReadRequest internally.

@youennf
Copy link
Contributor Author

youennf commented Apr 12, 2023

I would lean towards erroring the stream. Silently losing data is something I want to avoid.

Done.

If we add two-argument enqueue() we should also add two-argument write()

Yes, I plan to do a separate PR for WritableStream.
And probably another for transform stream.

I would like the two-argument form to work even when not using type: transfer as it is extremely useful in combination with transferable streams.

I did not add this in this PR but it should be fairly easy to extend it as a follow-up.
I am a bit hesitant to do this at this point:

  • Adding a transfer list would trigger cloning of the data (observable in non transferable streams case)
  • ReadableStreamTee might fail with those transfer-only values.
  • Adding transfer:true explicitly makes it clear that the web page opts into that behavior. It is not a big adoption step, filling the transferList is anyway needed.

@youennf
Copy link
Contributor Author

youennf commented Apr 12, 2023

I think the current version of the PR is ready for reference implementation prototyping.
Should I update the reference implementation in this PR as well?

@ricea
Copy link
Collaborator

ricea commented Apr 12, 2023

I did not add this in this PR but it should be fairly easy to extend it as a follow-up. I am a bit hesitant to do this at this point:

  • Adding a transfer list would trigger cloning of the data (observable in non transferable streams case)
  • ReadableStreamTee might fail with those transfer-only values.
  • Adding transfer:true explicitly makes it clear that the web page opts into that behavior. It is not a big adoption step, filling the transferList is anyway needed.

These are good points. Let's do it your way.

@MattiasBuelens
Copy link
Collaborator

  • We could introduce a second optional parameter to controller.enqueue, something like enqueue(any message, sequence<object> transfer = []) and update accordingly the StructuredTransferOrClone algorithm. This would make it closer to postMessage.

I like that a lot! 🙂 I understand that it's a bit out-of-scope for this PR, but I'd love to see that in a follow-up PR.

When we do add this, I would suggest we use an options object though, to align with postMessage(message, { transfer }). That makes it easier to add extra options for enqueue() and write() in the future.

ResetQueue would then transfer away ownership of all transfer lists still in the queue, without needing a separate [[isTransferring]] flag. In essence, type: "transfer" would merely force enqueue(chunk) to always be handled as enqueue(chunk, { transfer: [chunk] }).

I think the current version of the PR is ready for reference implementation prototyping.
Should I update the reference implementation in this PR as well?

Yes, please. Each spec change on the Streams standard should be accompanied with new WPT tests, and the reference implementation should pass those new tests.

@ricea
Copy link
Collaborator

ricea commented Apr 12, 2023

Should I update the reference implementation in this PR as well?

Yes, please do.

@youennf
Copy link
Contributor Author

youennf commented Apr 12, 2023

OK, I'll start doing it.
Rethinking a bit, I wonder the following:

  • Would type:'serialize' be a better name than type:'transfer' (which conflicts a bit with transform streams, transferable streams and we sometimes serialize and not transfer chunks)?
  • Is the syntactic sugar for transferable objects (enqueue(videoFrame) instead of enqueue(videoFrame, [videoFrame])) a good idea now that we have the additional transfer parameter?

@youennf
Copy link
Contributor Author

youennf commented Apr 12, 2023

For testing, it seems we use node.js with WPT and the ref implementation.
Are there any transferable object like MessagePort supported in node.js?
Ditto for serializable, with say VideoFrame?

@ricea
Copy link
Collaborator

ricea commented Apr 13, 2023

OK, I'll start doing it. Rethinking a bit, I wonder the following:

  • Would type:'serialize' be a better name than type:'transfer' (which conflicts a bit with transform streams, transferable streams and we sometimes serialize and not transfer chunks)?

This maybe needs a bit more bikeshedding. Another alternative I can think of is type: 'owning'.

  • Is the syntactic sugar for transferable objects (enqueue(videoFrame) instead of enqueue(videoFrame, [videoFrame])) a good idea now that we have the additional transfer parameter?

I really like the syntactic sugar personally, and I think being a different type of stream is a good enough excuse to use it. I'd like to get other people's input on this because there may be footguns I'm overlooking.

@domenic
Copy link
Member

domenic commented Apr 13, 2023

I really like the syntactic sugar personally, and I think being a different type of stream is a good enough excuse to use it. I'd like to get other people's input on this because there may be footguns I'm overlooking.

I would be most comfortable with requiring enqueue(videoFrame, { transfer: [videoFrame] }). That is the most like other web platform APIs that do transferring, such as structuredClone() and postMessage(). (Note: the { transfer: [x] } form is the modern version that replaces [x], so indeed, @MattiasBuelens's suggestion in #1271 (comment) is a good one.)

However, I recognize it is verbose. So we might want to blaze a new path here and have enqueue() be the first sort of "auto-transfer" API. We should think carefully about how that works, and what pattern we might be setting up for future such APIs.

For testing, it seems we use node.js with WPT and the ref implementation.
Are there any transferable object like MessagePort supported in node.js?
Ditto for serializable, with say VideoFrame?

You might run up against the limits of the reference implementation here, for which I apologize. It appears that for transferrable streams #1053 we did not make any reference implementation changes and just skipped running the relevant tests against the reference implementation. But that was mostly fine because it didn't change the streams API itself. This change is more intrusive to the streams API so it would be a real shame to let the reference implementation get out of sync.

Your main tools for trying to tackle this are:

  • Patching the window in which the reference implementation and test runs: https://github.com/whatwg/streams/blob/main/reference-implementation/run-web-platform-tests.js#L48 . In particular you could put things from the Node.js environment in here.
  • Node.js does support globalThis.structuredClone: https://nodejs.org/api/all.html#all_globals_structuredclonevalue-options . I think that might be enough to implement what you need, with some creativity.
  • ArrayBuffers are transferable, and plain JS objects are cloneable. So if you put those in the tests you write, it should work in any environment. You probably also want to add separate WPTs for VideoFrame specifically, since that's an important use case, but keep those in a separate file from the generic clone/transfer ones, so that we can exclude them when running reference implementation tests in Node.

@youennf
Copy link
Contributor Author

youennf commented Apr 13, 2023

Another alternative I can think of is type: 'owning'.

I'll change to this, it is less confusing than transfer.

I would be most comfortable with requiring enqueue(videoFrame, { transfer: [videoFrame] }).

Let's start with this and let's do the shortcut as a follow-up.

  • ReadableStreamTee might fail with those transfer-only values.

ReadableStreamTee would error both branches according the current algorithm.
Maybe we should relax this rule to only error the second branch. Let's also look at this in a follow-up.

@youennf youennf force-pushed the add-support-for-stream-transfer-type branch 2 times, most recently from d465e3f to aacc034 Compare April 13, 2023 20:27
@youennf youennf changed the title Add support for ReadableStream "transfer" type Add support for ReadableStream "owning" type Apr 13, 2023
@youennf youennf force-pushed the add-support-for-stream-transfer-type branch 2 times, most recently from b52ba6b to 841d2fd Compare April 14, 2023 06:28
@youennf youennf marked this pull request as ready for review April 14, 2023 06:34
@youennf
Copy link
Contributor Author

youennf commented Apr 14, 2023

PR is ready for review.
If moving forward, this could be completed by the following tasks:

  • Add WritableStream & TransformStream support
  • Investigate shorter enqueue-with-transfer syntax
  • Investigate transferring chunks when transferring owning streams
  • Investigate ReadableStreamTee to only error clone 2 branch if structure cloning.
  • Investigate ReadableStreamTee realtime mode

youennf and others added 2 commits June 7, 2023 18:18
Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
…ebidl

Co-authored-by: Kagami Sascha Rosylight <saschanaz@outlook.com>
@youennf youennf force-pushed the add-support-for-stream-transfer-type branch from b700260 to eed4579 Compare June 7, 2023 16:38
@youennf youennf force-pushed the add-support-for-stream-transfer-type branch from eed4579 to 9146111 Compare June 7, 2023 20:00
@youennf
Copy link
Contributor Author

youennf commented Jun 7, 2023

PR is green and was approved. Is it ready for being merged?

Copy link
Member

@saschanaz saschanaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, should we care about Symbol.dispose ECMA262 Stage 3 proposal? It looks reasonable for this API to support that, and if so it should be better sooner than later to prevent webcompat issue.

@@ -651,8 +651,15 @@ enum ReadableStreamType { "bytes" };
<p>For an example of how to set up a readable byte stream, including using the different
controller interface, see [[#example-rbs-push]].

<p>Setting any value other than "{{ReadableStreamType/bytes}}" or undefined will cause the
{{ReadableStream()}} constructor to throw an exception.
<p>Can be set to "<dfn enum-value for="ReadableStreamType">owning</dfn>" to signal that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p>Can be set to "<dfn enum-value for="ReadableStreamType">owning</dfn>" to signal that the
<p>If set to "<dfn enum-value for="ReadableStreamType">owning</dfn>", it signals that the

Can this and the above be "If set to"? "Can be set to" made sense when there was only one possibility, but repeating that makes less sense to me.

index.bs Outdated
{{ReadableStream()}} constructor to throw an exception.
<p>Can be set to "<dfn enum-value for="ReadableStreamType">owning</dfn>" to signal that the
constructed {{ReadableStream}} will own chunks (via transfer or serialization) before enqueuing them.
This ensures that enqueued chunks are not mutable by the source.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This ensures that enqueued chunks are not mutable by the source.
This ensures that enqueued chunks are not mutated by the source.

"mutable by" sounds a bit unnatural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it to cannot be mutated by the source

@@ -2950,13 +2969,21 @@ The following abstract operations support the implementation of the
<div algorithm>
<dfn abstract-op lt="ReadableStreamDefaultControllerEnqueue"
id="readable-stream-default-controller-enqueue">ReadableStreamDefaultControllerEnqueue(|controller|,
|chunk|)</dfn> performs the following steps:
|chunk|, |transferList|)</dfn> performs the following steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your comment, I mean Streams have no existing example that receives empty list, while it has several optional parameters in section 9. But either way is fine I guess.

index.bs Outdated
@@ -5188,7 +5217,7 @@ The following abstract operations support the implementation of the
id="writable-stream-default-controller-close">WritableStreamDefaultControllerClose(|controller|)</dfn>
performs the following steps:

1. Perform ! [$EnqueueValueWithSize$](|controller|, [=close sentinel=], 0).
1. Perform ! [$EnqueueValueWithSize$](|controller|, [=close sentinel=], 0, undefined).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this undefined be « » for consistency, or vise versa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right let's do that.

Comment on lines +6635 to +6644
<div algorithm>
<dfn abstract-op lt="StructuredTransferOrClone">StructuredTransferOrClone(|value|, |transferList|)</dfn>
performs the following steps:
1. Let |serialized| be ? [$StructuredSerializeWithTransfer$](|value|, |transferList|).
1. Let |deserialized| be ? [$StructuredDeserializeWithTransfer$](|serialized|, [=the current Realm=]).
1. Return |deserialized|.\[[Deserialized]].
</div>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please file an issue to not keep the duplication too long, or let me know so that I can file one)

@youennf youennf requested a review from saschanaz June 12, 2023 11:55
Copy link
Member

@saschanaz saschanaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a remaining question about Symbol.dispose.

@youennf
Copy link
Contributor Author

youennf commented Jun 12, 2023

LGTM, with a remaining question about Symbol.dispose.

Oh sorry, I missed the question.
The plan is to support this at some point in a follow-up PR, we will concentrate on ReadableStream and others first.

index.bs Outdated Show resolved Hide resolved
@youennf
Copy link
Contributor Author

youennf commented Jun 15, 2023

Is it ready to merge?

@ricea
Copy link
Collaborator

ricea commented Jul 12, 2023

Sorry for the radio silence. I'm trying to catch up on the latest changes, but I've been a bit swamped.

@jan-ivar
Copy link

Any updates on this? We're trying to solve w3c/webtransport#507.

@MattiasBuelens
Copy link
Collaborator

@ricea Friendly ping. If you could find some time to review this, that'd be appreciated. 😉

@jasnell
Copy link

jasnell commented May 4, 2024

Overall I'm happy with the general idea but this feels rather heavyweight for the use case. Perhaps transfer can be built into the existing model without introducing a new stream "type".

For instance, ReadableStreamDefaultController can have a new method controller.clone(value, transferList) that is like controller.enqueue(...) but clones/transfers the value when enqueued.

@MattiasBuelens
Copy link
Collaborator

@jasnell That would imply that a stream can hold both "owned" and "unowned" chunks at the same time. This would mean a bunch more bookkeeping and passing around extra parameters:

  • The stream's queue would need to record for each chunk whether it is owned or not, so ResetQueue knows to only call the dispose steps on owned chunks. We'll need to extend value-with-size to have an extra isOwned item.
  • ReadableStreamTee would no longer unconditionally clone chunks to both branches. Instead, it needs to first check for each chunk received from the original stream whether it was owned or not. We'd need to extend read request's chunk steps to have an extra parameter isOwned to store that information.
    • Should this information be exposed through ReadableStreamDefaultReader.read() too? Something like returning a { done, value, owned } tuple instead? (This is probably a very bad idea... 😅)
  • TransformStreamDefaultController would need the same treatment, and get a clone() method.
  • We would need to extend this to WritableStream too, e.g. WritableStreamDefaultWriter.writeCloned(chunk, { transfer: [] }) or .write(chunk, { clone: true, transfer: [] })
  • Would we need to inform a WritableStream's sink or a TransformStream's transformer that a written chunk is now owned by the stream, in case it wants to forward that chunk to Readable|TransformStreamDefaultController.clone()? That would mean passing an isOwned parameter to sink.write() or transformer.transform() somehow... 😕

All in all, I think it's easier to have a separate type for owning streams.

@MattiasBuelens
Copy link
Collaborator

Hmm, how does this interact with transferable streams?

Sure, we transfer in controller.enqueue() to push the chunk onto the stream's queue. But if the stream has been transferred, then:

  • the cross-realm writable we create should also be of type owning
  • the message we post through the internal MessagePort containing the chunk should have a proper transfer list

To support this, I think we'll have to store the transfer list alongside each chunk in the queue, so we can re-transfer the chunk later on. 🤔

@jasnell
Copy link

jasnell commented May 9, 2024

@MattiasBuelens ... sorry I wasn't clear in my comment here #1271 (comment) ... what I was suggesting is ONLY cloning/transferring at the point the chunk is enqueued, forgoing the rest of the proposed "owning" semantics.

@saschanaz
Copy link
Member

Random thought about being heavyweight, what about:

let readable = new ReadableStream({
  start(controller) {
     // Wrap the chunk with a new interface (name chosen randomly)
     controller.enqueue(new Ownership(chunk));
  }
});
// ResetQueue checks whether the chunk is Ownership, and if yes,
// run dispose steps of the inner data of Ownership.
readable.cancel();
// Same for WritableStream, it would handle chunks that are Ownership.
readable.pipeTo(writable)

@youennf
Copy link
Contributor Author

youennf commented May 24, 2024

I also think a stream that only handles owned chunks is simpler to reason about and as powerful.
Some cases for individually owned chunks can be covered with structuredClone though not for disposing in case of cancel.

@saschanaz
Copy link
Member

saschanaz commented May 24, 2024

I think having a separate representation of ownership instead of changing the internal of streams would be easier; when you pipe you pass the ownership without having to worry about whether the destination is owning stream or not.

(But doing so would have bigger effect in the web platform...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants