Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sea: add ability to embed auxiliary data asset for use with sea #50941

Closed
wants to merge 2 commits into from

Conversation

pipobscure
Copy link
Contributor

@pipobscure pipobscure commented Nov 27, 2023

When bundling code into a single executable application sometimes it is necessary to include non-code assets. While this is possible to do as base64 encoded blobs in the code itself, adding these assets as a separate resource makes things a lot smaller.

When bundling code into a single executable application sometimes it is
necessary to include non-code assets. While this is possible to do as
base64 encoded blobs in the code itself, adding these assets as a
separate resource makes things a lot smaller.
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/single-executable

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. single-executable Issues and PRs related to single-executable applications util Issues and PRs related to the built-in util module. labels Nov 27, 2023
@pipobscure
Copy link
Contributor Author

pipobscure commented Nov 27, 2023

@joyeecheung this is the feature we talked about at nodeconf.eu If you could be so kind as to review, guide me through the PR-process and/or discuss, I'd appreciate it.

For an example of how this could be used see @pipobscure/sea.

@joyeecheung
Copy link
Member

joyeecheung commented Nov 27, 2023

Ah, sorry, I should've mentioned that I had another implementation done earlier nodejs/single-executable#68 (comment) (branch at https://github.com/joyeecheung/node/tree/sea-assets) which is a bit more flexible. I was holding it off because I was working on deflaking the SEA tests on the main branch (otherwise it might be unwise to add more SEA features while the main branch is flaking). See nodejs/single-executable#68 (comment) for an overview of that design. I do think that design is a bit easier to use for a more generic case though (it supports multiple files and we could add e.g. compression using the configuration in the future), and it eliminates an additional postject command. Does that also work with your needs?

@pipobscure
Copy link
Contributor Author

I personally like the separation of concerns that injecting the data as a separate resource gives you. I also think that adding actual multi-asset capability should be part of VFS (or other higher level system) and to keep this simple. So I'd prefer this to be just a simple getAsset()/getEmbeddedData() that get's a single read-only buffer. But that is a strong opinion weakly held.

The one thing that I really would like to see is the ability to access the data as a buffer without copying. The copy overhead seems really silly just to prevent modification. I understand the impulse, but that could also be done by not exposing the modification APIs. As a minimum there should be an option to explicitly ask for the non-copy buffer.

TL;DR: your design fulfils what we need as well. I'll work with it.

@pipobscure pipobscure closed this Nov 27, 2023
@joyeecheung
Copy link
Member

joyeecheung commented Nov 27, 2023

Yes, I agree the copying overhead is undesirable. I am not sure how serious things could be if users do modify that segment in practice though, so it's just a defensive interface for now (alas, it seems there's no such thing as immutable ArrayBuffer in JavaScript...for now). Maybe putting it behind an ArrayBuffer to avoid out-of-bound access is arealdy safe enough.

@joyeecheung
Copy link
Member

After landing the SEA test updates I opened #50960. To work around the copying issue, there is an interface returning a Blob that allows read-only access to the the asset without copying the whole buffer - although that forces users to access the API asynchronously for now.

It seems modifying the returned array buffer would lead to a crash because the segments used are protected (at least on macOS and Linux). I think there isn't much we can do to prevent modification if we return the data in ArrayBuffer or any ArrayBufferViews - there is a TC39 proposal very early in the works though https://github.com/tc39/proposal-limited-arraybuffer. Not sure if there are better ways to return it as read-only besides in a Blob, which still comes with a bit of overhead for now.

@pipobscure
Copy link
Contributor Author

Thanks! That’s awesome!

Exposing the blob is ultimately not all that useful, because any way of accessing it also copies the data.

blob.arraybuffer() - makes a copy
blob.text() - converts to text and thereby copies
blob.stream().getReader() - has a byob mode, but that copies into a provided array buffer, so again it does do a copy, it just saves the allocation

Unless the destination of the data is to be streamed out unchanged of course.

What is wrong with saying in docs here is an ArrayBuffer, but if you mess with it you’ll crash? Then as a developer I could just adhere to that and live happily ever after.

@joyeecheung
Copy link
Member

Exposing the blob is ultimately not all that useful, because any way of accessing it also copies the data.

Ah right, with Blob there would still be some copying at the end of the day.

What is wrong with saying in docs here is an ArrayBuffer, but if you mess with it you’ll crash?

I think that depends on how comfortable people are with this. I added a comment to #50960 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. single-executable Issues and PRs related to single-executable applications util Issues and PRs related to the built-in util module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants