Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement streams by adopting the reference implementation #3200

Draft
wants to merge 64 commits into
base: main
Choose a base branch
from

Conversation

ninevra
Copy link
Contributor

@ninevra ninevra commented Jun 3, 2021

This PR implements the whatwg streams standard by adding the reference implementation as a git subtree and adapting it to run in jsdom. Most but not all streams features work or can be made to work. Merging future upstream changes to the reference implementation is reasonably practical.

What do people think? Is adopting the reference impl worth pursuing further? Any ideas on how to fix the prototype/realms issues?

Missing streams features

  • Transferable streams: jsdom's postMessage() implementation doesn't seem to support transferlists at all, so this feature's somewhat meaningless. The tests fail; I haven't investigated why in particular.
  • ArrayBuffer ownership by readable byte streams: the spec requires readable byte streams to do a lot of transfering ArrayBuffers and checking if they're detached. I don't know of any way to do this, neither in pure ecmascript nor in node. The reference implementation faked the impossible operations with a bunch of property-mangling and copying; this PR makes them no-ops. This causes the relevant tests to fail.

Remaining Bugs

  • The AsyncIterator prototype is taken from the Node realm, not the jsdom realm.
  • The Promise prototype is taken from the Node realm, not the jsdom realm.
  • The patched-global tests fail due to vm: in strict mode, assigning functions to certain globals throws, but succeeds nodejs/node#38918. One possible solution might be to install interfaces on the window prior to calling vm.createContext().
  • Some tests that I expected to just fail instead crash or timeout the test harness.
  • I updated the web-platform-tests submodule to get the latest streams tests. Many other tests were incidentally affected (renamed files, new tests, etc.). I've tracked down some of the changes, but a bunch of failures remain & need to be validated (or the update rolled back).

Merging upstream changes

git subtree pull --squash -P lib/jsdom/living/streams https://github.com/whatwg/streams.git main (approximately)

Alternatives

  • Wait for Node itself to acquire a streams impl, and then inject it (or a wrapper around it) into the sandbox. Would allow proper ArrayBuffer handling for byte streams. Might never happen, though there's been a bit of interest in the Node repos recently.
  • Reimplement streams from scratch. I'll open a draft PR shortly with the very beginnings of such an implementation.
  • Don't implement streams. Blocks full implementation of e.g. FileAPI, Blob, fetch.

TODO

ninevra added 30 commits May 27, 2021 10:08
git-subtree-dir: lib/jsdom/living/streams
git-subtree-split: 033c6d900c438371083785828c659e3150cf4847
Several types must retain globalObject references, for the purpose of 
creating other impls, but may be constructed without running their 
constructors.
These tests crash the harness in confusing ways, and in any case can't 
pass until postMessage() is more completely implemented.
This reverts commit 393edff.

Import rewrites are no longer needed now that files have been moved to
the appropriate directory.
8a7d92b5 Resolve BYOB reads immediately on cancel

git-subtree-dir: lib/jsdom/living/streams
git-subtree-split: 8a7d92b559c2b3f5028d9f48b49de90044be35c1
Should help clarify when a value is a promise _capability_ versus just a 
promise.
Better reflects its current purpose.
@ninevra
Copy link
Contributor Author

ninevra commented Jun 23, 2021

So, looking into the AsyncIterator prototype: it's harder to fix these than the Iterator prototype, because there are no ecmascript globals that use them. The Iterator prototype (in current webidl2js) is accessed by https://github.com/jsdom/webidl2js/blob/ab63e7e8ed59659dd961eef0ac0e56060db19870/lib/output/utils.js#L39-L42, which goes through Array. Afaik, the only way to access the AsyncIterator prototype is to construct an async generator function:

Object.getPrototypeOf(
  Object.getPrototypeOf(
    (async function* () {}).prototype // an anonymous prototype
  ) // the AsyncGenerator prototype
) // the AsyncIterator prototype

which presumably has to be executed as a script in the vm in order to get the right realm.

This could maybe be achieved by adding code to webidl2js that checks on whether the globalObject is a vm context and if true runs the above script in it, but that would

  • add jsdom-specific behavior to webidl2js and
  • be subtly dependent on the details of jsdom's implementation. For instance, const ctx = {}; vm.createContext(ctx); vm.runInContext('globalThis', ctx) === ctx is false, meaning that if the generated wrappers were to end up bundled into a vm script as Constructor and prototype reform tracking #2727 speculates, such a check might break.

My current best idea for how to get the AsyncIterator prototype into webidl2js is to expose a configuration function,

function setInCtorRegistry(globalObject, key, value) {
  const ctorRegistry = utils.initCtorRegistry(globalObject);
  ctorRegistry[key] = value;
}

or less generally

function registerAsyncIteratorPrototype(globalObject, value) {
  setInCtorRegistry(globalObject, "%AsyncIteratorPrototype%", value)
}

which could be called by the user before install()ing interfaces.

@domenic what do you think?

@domenic
Copy link
Member

domenic commented Jun 23, 2021

I like the setInCtorRegistry idea! (Although for a public API I'd call it setInConstructorRegistry :).)

@ninevra
Copy link
Contributor Author

ninevra commented Jun 23, 2021

Cool! One issue that raises is that all the names in the constructor registry then become (at least technically) part of that public API. At one time that was just the names of webidl types; after the sync iterator prototype fixes it now includes some stringly-typed intrinsics ("%Array%", "%IteratorPrototype%").

(I can imagine it eventually expanding to include all intrinsics, allowing us to replace code like new this._globalObject.Uint8Array() with new getFromCtorRegistry(this._globalObject, "Uint8Array")() to guard against scripts overwriting the global Uint8Array etc.)

Design questions: Should the stringly-typed things maybe be replaced with symbols? How much of the constructor registry should be documented, and should any be explicitly reserved for internal use?

I'll try get a draft of this up and running and see what other issues come up.

@domenic
Copy link
Member

domenic commented Jun 23, 2021

Note that per the currently-stalled jsdom/webidl2js#242 the plan is to use the normal names as much as possible (e.g. "Array" instead of "%Array%").

I think the % variants are fine though for things that aren't globals.

I don't think symbols have any advantages here over strings; that seems like an unnecessary level of indirection.

Documenting the new public API (setInConstructorRegistry) seems like a good start, including what the expected parameters are for it. (I think currently it's "%Array%", "%IteratorPrototype%", and you'd add "%AsyncIteratorPrototype%". Then over time we'd update it if we, e.g., merged jsdom/webidl2js#242.)

@ninevra
Copy link
Contributor Author

ninevra commented Jun 29, 2021

I found a possible workaround for nodejs/node#38918: if we run scripts in the vm that define empty properties for all of the globals we later install, subsequent attempts to set them appear to behave correctly. This allows running the tee and patched-global tests (which turned up a bug 🎉, now fixed). Could just do this for the streams globals necessary to run the tests, or could do it for most or all jsdom globals.

Comment on lines +22 to +23
// We need to set the 'name' property:
// eslint-disable-next-line prefer-arrow-callback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW you can do

const size = chunk => chunk.byteLength;
sizeFunctionWeakMap.set(globalObject, size);

The size function's name property will be automatically set.


Also, do we need to create the function in globalObject, to ensure the prototype is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is straight from the reference impl. Personally I have a mild preference for function size() {} over const size = , I think it makes what's going on a little more clear.

I think you're right about the prototype, but that's a jsdom-wide problem, isn't it? E.g. Object.getPrototypeOf(window.Headers) !== window.Function.prototype. Not sure it's worth hacking around that issue here specifically, unless there's a really simple fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using function size() { } is important because per spec the result has a .prototype property, whereas arrow functions do not. (I wonder if that's tested.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ninevra

I think you're right about the prototype, but that's a jsdom-wide problem, isn't it?

Ah yes, indeed.


@domenic Printing

new ByteLengthQueuingStrategy({ highWaterMark: 100 }).size.prototype

gives undefined in Chrome and Firefox. The Streams spec calls CreateBuiltinFunction without calling MakeConstructor, which is the abstract op that adds the prototype property. So it would appear that we should in fact use arrow functions after all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ninevra
Copy link
Contributor Author

ninevra commented Jul 3, 2021

I've identified two places here where webidl2js extensions might be useful:

Expose realm intrinsics to impls

In order to fix the Promise prototype, impls need to be able to access their relevant realm's Promise intrinsic (not just this._globalObject.Promise).

webidl2js could capture the Promise intrinsic at initCtorRegistry() like it does for IteratorPrototype etc. Probably we'd want a nicer public util API for accessing intrinsics than globalObject[utils.ctorRegistrySymbol], though; getIntrinsic(key, globalObject)?

Alternatives include: an install-time hook of some sort, allowing the impl to save any intrinsics it needs in e.g. a WeakMap from globalObjects to intrinsics (sort of like the sizefn WeakMaps).

(Byte streams need the Uint8Array and ArrayBuffer intrinsics too.)

(In theory webidl2js could automate some of these conversions [which would incidentally allow e.g. producing Promise<Wrapper> from a returned Promise<Impl>] but that's very non-trivial, and ArrayBuffer afaik can't be moved between realms without copying.)

init(impl, globalObject)

Many of the stream type impls need to save a reference to their relevant realms. The usual idiom is to do this in the impl constructor; but the impl constructor is not guaranteed to be called, and in fact the streams reference implementation often does not call it (I think in order to stay closer to the spec). Currently I'm setting impl._globalObject in the relevant setUp abstract operations, but that's sort of a confusion of concerns, and results in somewhat unreadable code (since this._globalObject gets used in an impl file that never sets it).

If init() received the globalObject as an argument, that would simplify this (& be a safer idiom for general use).

@domenic
Copy link
Member

domenic commented Jul 7, 2021

In order to fix the Promise prototype, impls need to be able to access their relevant realm's Promise intrinsic (not just this._globalObject.Promise).

Can you explain how these are different?

The spec wants you to use current realm, not relevant realm. But getting the current realm is hard with current architecture, as discussed in #2727.

However the relevant global object should be the same as this._globalObject, so I think this._globalObject.Promise should work, as long as we compromise by saying relevant is good enough.

I can understand how there's a problem with webidl2js-generated Promise instances; that's what jsdom/webidl2js#242 attempts to solve. (Feel free to take over that PR yourself by submitting a new version that responds to the outstanding review comments.) But for impl code, I think this._globalObject.Promise should work.

init(impl, globalObject)

This idea makes a lot of sense to me.

@ninevra
Copy link
Contributor Author

ninevra commented Jul 7, 2021

In order to fix the Promise prototype, impls need to be able to access their relevant realm's Promise intrinsic (not just this._globalObject.Promise).

Can you explain how these are different?

The spec wants you to use current realm, not relevant realm. But getting the current realm is hard with current architecture, as discussed in #2727.

However the relevant global object should be the same as this._globalObject, so I think this._globalObject.Promise should work, as long as we compromise by saying relevant is good enough.

The issue is that user code can mess with this._globalObject, so this._globalObject.Promise can be an arbitrary user-defined value, whereas streams are only supposed to create actual Promises. Basically it's the difference between the initial value of this._globalObject.Promise and the current value.

There's some web-platform-tests that mess with globalThis.Promise and globalThis.Promise.prototype.then to make sure they don't affect the promises returned by streams (e.g. readable-streams/patched-global.any.js); using this._globalObject.Promise causes those to fail.

The current implementation passes these tests because it uses wrong-realm Promises; adopting in-realm promises should require capturing both Promise and Promise.prototype.then before the user can modify them (which is what the reference implementation does, though its method won't work here).

@domenic
Copy link
Member

domenic commented Jul 7, 2021

Ah, I see, that makes perfect sense.

In general jsdom is not very robust against people messing with the intrinsics. We get some protection for free from the impl/wrapper separation, e.g. we can guarantee that this.getAttributeNS() is calling the impl, which is relatively hidden, instead of the possibly-overridden wrapper. But we call tons of methods without being super-cautious about saving them ahead of time.

So my inclination would be to just fail these tests for now. Maybe in the future we could revisit the problem generally, but I'd like to do so in an automated fashion, e.g. using some derivative of https://github.com/jackbsteinberg/get-originals-rewriter

@ninevra
Copy link
Contributor Author

ninevra commented Jul 7, 2021

Okay, I'll continue with using this._globalObject.Promise for now then, and see what tests fail.

@TimothyGu pointed out in jsdom/webidl2js#247 (comment) that we actually often need to capture properties of intrinsics, like Object.prototype. I think that's true here as well, that we have to capture Promise.prototype.then, because either Promise or Promise.prototype could get modified by the user. That seems like a strong argument in favor of an Impl.install() hook and/or a general rewriting approach like you suggest, rather than my original suggestion of capturing a partial list of intrinsics in webidl2js.

@armanbilge
Copy link

Thrilled to see this PR, just want to heap some encouragement/appreciation :) we'd find this very useful for testing some modules for Scala.js libs such as fs2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants