Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement createObjectURL/Blob from File API #16167

Closed
3 of 5 tasks
bmeck opened this issue Oct 12, 2017 · 21 comments
Closed
3 of 5 tasks

Implement createObjectURL/Blob from File API #16167

bmeck opened this issue Oct 12, 2017 · 21 comments
Assignees
Labels
feature request Issues that request new features to be added to Node.js. stale

Comments

@bmeck
Copy link
Member

bmeck commented Oct 12, 2017

Tracking Issue to allow Loaders to create in-memory URLs that can be imported for things like code coverage:

@bmeck bmeck self-assigned this Oct 12, 2017
@bmeck
Copy link
Member Author

bmeck commented Oct 12, 2017

@bcoe Here ^

@TimothyGu
Copy link
Member

For reference: https://w3c.github.io/FileAPI/

@bcoe
Copy link
Contributor

bcoe commented Oct 12, 2017

@bmeck @TimothyGu I'd be interested in pitching in on this work, along with being one of the early consumers with Istanbul ... designing the Blob and BlobeStore bit sounds interesting. Do you picture we'd be exposing existing structures in V8?

@mscdex mscdex added the feature request Issues that request new features to be added to Node.js. label Oct 12, 2017
@bmeck
Copy link
Member Author

bmeck commented Oct 12, 2017

@bcoe great! Unfortunately v8 does not expose Blobs in the File API terms, their blobs in v8.h refer to snapshot blobs which are a very different beast. The File API is quite thorough in what should be done. We should avoid File for now though since I can't think of a clear use case.

The important bit to the BlobStore is that it works across workers. If a worker makes a url using URL.createObjectURL it should be available in all threads.

If you need any help I can assist when I have a bit more free time or if you schedule something in advance I will make time.

@refack
Copy link
Contributor

refack commented Oct 12, 2017

For reference - What is BlobStore?

@bmeck
Copy link
Member Author

bmeck commented Oct 12, 2017

@refack it is the place that url string => Blob mapping is stored by the environment. See spec.

It is used such that it can share URLs across workers so you can do multi-threaded processing: https://jsfiddle.net/ctyvm1tr/1/

@bcoe
Copy link
Contributor

bcoe commented Oct 12, 2017

@bmeck I intend to make some time this weekend to read through the spec and play with the existing APIs in the browser. Once I know more than basically nothing, I would definitely be interested in arranging a quick screen share.

Is there any prior art in the codebase that shares state across workers that we could build on?

@bmeck
Copy link
Member Author

bmeck commented Oct 12, 2017

@bcoe nothing in this realm that is sane to read that I know of. I know game engines use it, but that isn't helpful since I don't know their internals.

@Fishrock123
Copy link
Member

I'm not really certain what the point of this is given we have an existing file system api and various types of buffers. Could this please be elaborated on before implementation? Thanks.

@jasnell
Copy link
Member

jasnell commented Oct 13, 2017

So I've been working on this but I've been behind due to other pressing matters. It's very much something that I would like to see. To be specific: I already have an implementation underway, I just haven't had the time to finish it. My goal is to have an initial implementation by mid to late November.

In terms of the what the implementation would provide:

  • A node::blob::Blob native class that represents an immutable chunk of data. This could represent a file on disk, it could represent an allocated chunk of memory, etc. There would be a corresponding JS object but the key point of node::blob::Blob is that the data is held at the native layer without ever crossing into JS unless a FileReader is used.

  • A node::blob::BlobStore native class that is essentially an addressable store for node::blob::Blob objects. This is essentially a relatively straightforward map-like object.

  • JavaScript level Blob, File and FileReader classes implemented per the spec. These would be backed by the node::blob::Blob.

  • An implementation of URL.createObjectURL(). There would be both C and JS implementations of this method, allowing a URL to be generated for a node::blob::Blob within a node::blob::BlobStore.

While this all may seem complicated, the interfaces here are rather simple. A File Blob, for instance, is a thin wrapper on top of libuv's existing file system operations for reading a file. This would essentially just end up being a FileReader based alternative to fs.createReadStream(). It's really quite lightweight in the details. The key issue with File, however, is the requirement to support mime types, which we currently do not handle within Core. That will take some thinking to figure out.

For Blob in general, it is really nothing more than a persistent allocated chunk of memory. It would be possible to create a Blob from one or more TypedArray objects. I'm sketching out additional APIs for the http and http2 modules that would allow a response to draw data from a Blob rather than through the Streams API. There is already something analogous in the http2 implementation in the form of the respondWithFile() and respondWithFD() APIs in the http2 side. Basically, the idea would be to prepare chunks of allocated memory at the native layer, with data that never passes into the JS layer (unless absolutely necessary to do so), then use those to source the data for responses. In early benchmarking this yields a massive boost in throughput without the usual backpressure control issues.

There is certainly a cost, and there are aspects of the implementation that are non-trivial, but the benefits are quite real.

FWIW, I'm not entirely sold on the idea of implementing the File and FileReader portions of this model yet, so I haven't worked on those pieces and could easily be talked out of doing so.

@bcoe
Copy link
Contributor

bcoe commented Oct 13, 2017

@jasnell my personal interest in this API surface is a follow on from:

#15445

The goal being to facilitate test-coverage and other transpilation steps in .mjs files.

I'm picturing that one could instrument code for coverage using pseudo code that looks something like this:

export async function resolve(specifier, parentModuleURL, defaultResolver) {
  const resolved = new url.URL(specifier, parentModuleURL)
  const ext = path.extname(resolved.pathname)
  if (ext === 'mjs') {
    const source = fs.readFileSync(resolved.pathname)
    const instrumented = istanbul.instrument(source)
    const blob = new Blob([instrumentedSource], {type : 'application/mjs'})
    return {
      url: createObjectURL(blob),
      format: 'esm'
    }
  } else {
    return defaultResolver(specifier, parentModuleURL)
  }
}

Does it seem like I'm on the same page as to how this API could potentially be used?

...an aside:

I keep coming back to the argument that @guybedford's work on #15445 should be exposed through an API hook rather than just a flag. In the world of developer tools, it's often the case that a few transformations need to be performed in sequence, e.g.,

  • a TypeScript transpilation step takes place to translate TypeScript typing into valid ES2015
  • a Babel transpilation parsing bleeding edge features, e.g., class decorators.
  • Istanbul runs, adding line counters to each line of (now ES2015) code.

I don't hate the idea of using createObjectURL() to facilitate the transpilation step ... but now that I sit down and hammer out some pseudo code, I'm not immediately seeing how one could compose the multi-step transformations (described above) using the --loader flag.

In the land of require.extensions one is able to create a stack of the prior transformations being applied, and a multistep transpilation can be applied without each actor knowing about the other (this is important, given the fractal nature of developer toolchains).

CC: @demurgos, @iarna

@devsnek
Copy link
Member

devsnek commented Jun 19, 2018

with the new worker api i'd like to get this all working primarily to support new Worker('blob:uuid')

@bmeck that should be enough reasoning to land mimes yea?

@guybedford
Copy link
Contributor

This work would be great to see.

@bcoe it's best not to try and see this as the final picture on the matter I think, but rather allow it to inform the discussions. The use case you describe is one very much understood by the modules group, that will be polished in due course.

Would also be interested to hear your thoughts on #18914 as it is a goal of mine to get that going again, just not sure how much to prioritise it right now.

@jimmywarting
Copy link

jimmywarting commented Feb 28, 2020

Don't really need the FileReader now when there exist new reading methods on blob's

  • blob.text() (promise)
  • blob.arrayBuffer() (promise)
  • blob.stream() whatwg readable stream

@jasnell
Copy link
Member

jasnell commented Feb 13, 2021

So folks know, I've already started work on the async blobs piece. And that is a prereq for the filesystem blobs. Expect a pr soonish.

@jasnell
Copy link
Member

jasnell commented Aug 12, 2021

URL.createObjectURL() and URL.revokeObjectURL() have landed.

@jimmywarting
Copy link

jimmywarting commented Aug 14, 2021

URL.createObjectURL() and URL.revokeObjectURL() have landed.

...And blob#streams and some other minor stuff! Cool! what's next? The File class?

I suppose a async blob source is supported now (from #39693) ...or?

I'm not entirely sure how the underlying data structor looks like anymore (how it's handled in the backend)... if it still behaves like a large ArrayBuffer bucket like before, or like a blobParts array that holds all chunks with a offset+size. What happens under the hood if you slice a large blob? dose it takes up more memory or is it just a references point now? what would eg happen if i did:

const blob = new Blob([new Uint8Array(2gb)])
const concat = new Blob([blob, blob]) // 4gb
concat.slice(0, 2gb)

@jimmywarting
Copy link

Reading blobs text larger than 500 MiB is a problem.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2022

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Sep 6, 2022
@jasnell
Copy link
Member

jasnell commented Sep 6, 2022

This was done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stale
Projects
Development

No branches or pull requests

10 participants