Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whatwg Streams interface #12

Open
rescribet opened this issue Jan 28, 2019 · 19 comments
Open

Whatwg Streams interface #12

rescribet opened this issue Jan 28, 2019 · 19 comments

Comments

@rescribet
Copy link

Is there any particular reason the current streams API is based on the event-based nodejs model, rather than the promise-based whatwg spec?

The latter has already been deployed in most browsers, and allows any type, so passing Quad instances as chunks would seem to be compliant, allowing full spec deference making this spec a lot simpler and cross-compatible.

The guys over at nodejs also seem to have started on implementing the whatwg spec, so switching might prevent premature obsolescence.

@rescribet rescribet changed the title Streams interface Whatwg Streams interface Jan 28, 2019
@awwright
Copy link
Member

awwright commented Jan 28, 2019 via email

@RubenVerborgh
Copy link
Member

Is there any particular reason the current streams API is based on the event-based nodejs model

It's not, it's based on the subset of it that is fast 🙂
e.g., no backpressure handling etc.

rather than the promise-based whatwg spec?

It is terribly slow still for the moment.

Source: had to roll my own iterator library for an RDF query engine that uses Terms internally, because Node.js streams (one magnitude) and promise-based streams (two magnitudes) are slower.

@rescribet
Copy link
Author

e.g., no backpressure handling etc.

C-code without bounds checking should be significantly faster as well, but there are reasons why the features are there of course. One or two orders is a pretty large hit anyway, but I fear the features left out will find their way back into the spec over time (e.g. @awwright calling guarantees)

promise-based streams (two magnitudes) are slower.

The streams interface is relatively new and will probably gain performance improvements over time, especially when native support has been written into the engine (rather than building something custom). But it's already fast enough for the guys over at facebook to use it to stream video

Any promise based solution would have the overhead of instantiating, scheduling, and calling those promises, doing that for every statement separately would severely degrade performance. The current rdfjs spec emits one quad at-a-time, was the Promise-based implementation you tested also doing that?

@RubenVerborgh
Copy link
Member

I fear the features left out will find their way back into the spec over time

The low-level spec provides everything a higher-level spec needs to build in such features as back-pressure handling and promises.

Any promise based solution would have the overhead of instantiating, scheduling, and calling those promises

Exactly.

doing that for every statement separately would severely degrade performance.

But that's the way you do (async) streams.

@bergos
Copy link
Member

bergos commented Jan 29, 2019

Besides the Quad streams our RDFJS packages will produce, they need to be feed by streams from other packages and need to write via streams to other packages. If you look at available, used and maintained packages, Node.js streams are the current standard. They are also available as separate package. In the latest version they can be used with Symbol.asyncIterator. As async loops are a JS language feature and not just a spec on top of JS, I think we are on the save side. This was also already discussed in #129.

We also discussed it in one of the calls and we decided to use the subset of Node.js streams. As this topic was already closed some time ago, I will close this issue. If there are new strong arguments for WHATWG streams, we can open the issue anytime.

@bergos bergos closed this as completed Jan 29, 2019
@elf-pavlik elf-pavlik transferred this issue from rdfjs/data-model-spec Mar 9, 2019
@elf-pavlik elf-pavlik reopened this Mar 9, 2019
@elf-pavlik
Copy link
Member

Transfered to stream-spec repo of @rdfjs and reopened to clarify few things.

I tried adding parser-n3 based on n3.js to some @solid related project and webpack bundle includes ~70k from readable-stream repo which adds quite a lot to recommended performance budgets: https://medium.com/@addyosmani/the-cost-of-javascript-in-2018-7d8950fbb5d4

Source: had to roll my own iterator library for an RDF query engine that uses Terms internally, because Node.js streams (one magnitude) and promise-based streams (two magnitudes) are slower.

@RubenVerborgh do you refer to https://github.com/RubenVerborgh/AsyncIterator ? N3.js doesn't seem to use it. I see a lot of packages from @comunica, node-quadstore and some other modules in https://www.npmjs.com/browse/depended/asynciterator

I see that @mcollina besides readable-stream also works on https://github.com/nodejs/whatwg-stream/ maybe he would have some input on defining RDF/JS Streams interfaces in a way that they will align with a future direction of node and whatwg streams.

@RubenVerborgh
Copy link
Member

Nope, N3.js should work out of the box without any stream dependencies, depending on what exactly you use.

@mcollina
Copy link

mcollina commented Mar 9, 2019

I think the best way is to add support for async iterators - when they are available. Both streams implementation offer them, they are fast, and they are more ergonomic and intuitive. They do the right think with error handling, which is always tricky.

@RubenVerborgh
Copy link
Member

Thanks @mcollina, but It's more nuanced than that:

  1. ES6 AsyncIterator is still at least a magnitude slower. Matters if you commonly have millions of elements.

  2. We only use a subset of the Stream interface; that subset is covered by simpler and more performant libs like https://github.com/RubenVerborgh/AsyncIterator

We are keeping an eye on ES6 AsyncIterator performance (I wish we had live tests in a browser, @joachimvh), but it's not looking good. Logical, because the protocol does more than the subset of stream we use.

@elf-pavlik
Copy link
Member

Thanks @mcollina and my bad with splitting conversation between here and twitter: https://twitter.com/elfpavlik/status/1104500178432806912

We had some prior conversation where more than one person expressed 👍 for ES Asynciterator rdfjs/dataset-spec#42 (comment)

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Mar 9, 2019

^ On high-level. Do this on low-level, and be prepared to handle streams one magnitude slower 😉

@elf-pavlik
Copy link
Member

Can we find a way to provide both:

  • speed due to

We only use a subset of the Stream interface; that subset is covered by simpler and more performant libs like https://github.com/RubenVerborgh/AsyncIterator

  • libraries like parsers and serializes relying only on features included in Web Platform and apps using them having to send readable-stream to the browser? Not sure if we could use dependency injection where application would pass to the parser (or other lib) one of
    • WHATWG ReadableStream
    • node Readable
    • your fast asynciterator module

Last two AFAIK have common interface but not the WHATWG ReadableStream.

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Mar 9, 2019 via email

@mcollina
Copy link

mcollina commented Mar 9, 2019

Agreed. You can easily add async interator support to your module and achieve better usability. It also does not remove your fast API when there is a need to.

@elf-pavlik
Copy link
Member

Sounds like:

  • if one has processing speed as priority one would use asynciterator module and send extra ~55k to the browser
  • if one wants to have read() which currently this spec requires on Streams but also has priority to use Symbol.asyncIterator over speed, would use readable-stream module and send extra ~70k to the browser
  • if one has priority to minimize code send to the browser and wants to use Symbol.asyncIterator provided by WHATWG ReadableStream, out of luck since this spec requires read()?

@mcollina
Copy link

You can provide an implementation of Symbol.asyncIterator without using any stream library.

@elf-pavlik
Copy link
Member

elf-pavlik commented Mar 10, 2019

I probably need to reformulate the problem. How we could implement those two parsers

So that one could use them in a web browser, parse response.body (a WHATWG ReadableStream), and add quads to the store. Currently both parsing and adding to store relies on Sink#import interface which expects a Stream with node style read() method on it. Besides parsers also quadstore module also implements Sink#import: https://github.com/beautifulinteractions/node-quadstore#rdfstoreprototypeimport

Would all three examples above and other similar modules, provide alternative interface which would expect ES AsyncIterator and return one? Application could choose which interface to import and ES modules could allow granular fetching or tree shaking.

Still, currently @rdfjs modules seem to directly import node stream eg. https://github.com/rubensworks/jsonld-streaming-parser.js/blob/master/lib/JsonLdParser.ts#L5

BTW it also seems that even if application would want to use fast asynciterator module, libraries above will bundle in readable-stream anyways since it gets imported there.

We also have this issue, to distribute those libraries as ES modules and not depend on Common JS and bundling: rdfjs-base/parser-n3#4

Which also might have benefits in cases where application uses dynamic import() to lazy load functionality.

@RubenVerborgh
Copy link
Member

  • if one has processing speed as priority one would use asynciterator module and send extra ~55k to the browser

4k gzipped

  • if one wants to have read() which currently this spec requires on Streams but also has priority to use Symbol.asyncIterator over speed, would use readable-stream module and send extra ~70k to the browser

asynciterator also has read().

  • if one has priority to minimize code send to the browser and wants to use Symbol.asyncIterator provided by WHATWG ReadableStream, out of luck since this spec requires read()?

yes

@RubenVerborgh
Copy link
Member

Would all three examples above and other similar modules, provide alternative interface which would expect ES AsyncIterator and return one?

We probably want an opt-in spec for ES6 AsyncIterator. Although it wouldn't be much of a spec; it's basically just ES6 AsyncIterator. We might want to add that to stream-spec itself (but might be good to have it orthogonal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants