Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building for externally shared js builtins #91

Merged
merged 3 commits into from Apr 14, 2024

Conversation

mochaaP
Copy link
Contributor

@mochaaP mochaaP commented Jan 26, 2024

Initial support for loading unbundled module in AddExternalizedBuiltin.
Refactored to use esbuild under the hood.

  • Reduces downstream distribution package size (by not shipping wasm twice and not base64-encoding it)
  • Provides a cleaner stacktrace
  • Easier to patch

To enable this, pass EXTERNAL_PATH=/global/node_modules/cjs-module-lexer to build.js.
You shall also pass this path to --shared-builtin-cjs_module_lexer/dist/lexer-path in Node.js's configure.py, with the extra /dist part in the path.

Reference:

nodejs/node@ca5be26b318
nodejs/node#44376
https://src.fedoraproject.org/rpms/nodejs-cjs-module-lexer/pull-request/2
nodejs/undici#2643

@khardix
Copy link
Contributor

khardix commented Jan 29, 2024

@mhdawson This is closely related to my/our work on distro-level WASM blob rebuilds. Could you take a look at this?

@mochaaP
Copy link
Contributor Author

mochaaP commented Jan 29, 2024

cc @guybedford request for review :)

@mhdawson
Copy link
Member

@khardix thanks for the heads up, will take a look tomorrow.

@mhdawson
Copy link
Member

mhdawson commented Jan 31, 2024

@mochaaP I took a look through and could you confirm my understanding that the following:

  1. The way the wasm blob is built does not change.
  2. Instead of always bundling the wasm blob into dist/lexer.js and dist/lexer.mjs, if EXTERNAL_PATH is specified, instead of bundling the wasm blob dist/lexer.js and dist/lexer.mjs will read the wasm blob from
    EXTERNAL_PATH/dist/lexer.wasm at runtime. If EXTERNAL_PATH is not specified then the wasm blob is bundled into dist/lexer.js and dist/lexer.mjs just like before.
  3. When built for EXTERNAL_PATH at runtime when Node.js is built with --shared-builtin-cjs_module_lexer/dist/lexer-path=<EXTERNAL_PATH> the expectd contents of EXTERNAL_PATH at runtime will be:
   <EXTERNAL_PATH>/lexer.js
                  /dist/lexer.wasm

Node.js will use <EXTERNAL_PATH>/lexer.js as the builtin for dist/lexar and then read the wasm blob from <EXTERNAL_PATH>/dist/lexer.wasm
4) EXTERNAL_PATH should be able to be any path on the filesystem, what you show above it just an example of how you'd plan to use it.

Overall I think that makes sense to me except that it seems to be that lexer.js and lexer.wasm should both be at the same level instead of one extra dist for lexer.wasm. I'd have to double check but I think when specifying --shared-builtin-cjs_module_lexer/dist/lexer-path= that should point to a directory with the contents of dist/lexer.js.

If the expectation was for

     <EXTERNAL_PATH>/lexer.js
                    /lexer.wasm

and the build step was also updated to copy lib/lexer.wasm to dist/lexer.wasm when EXTERNAL_PATH was specified, then providing package that installs the contents of the dist after the build to EXTERNAL_PATH would be more obvious.

Futher I assume that in most cases people will want to use both:
--shared-builtin-cjs_module_lexer/lexer-path
--shared-builtin-cjs_module_lexer/dist/lexer-path

In that context it might make sense to change where files get written when EXTERNAL_PATH is defined so that a directory
called externalized-dist is created with something like this:

externalized-dist/lexer.js
                 /wasm-lexer/lexer.js
                 /wasm-lexer/lexer.mjs
                 /wasm-lexer/lexer.wasm

And then the contents of externalize-dist is bundled into the package to be built/distributed separately. If that makes sense the possibly copying over the licence and maybe the readme into externalized-dist might make sense as well.

The current case where Node.js uses files from the top level as well as dist seems a bit odd to me, but would breaking to fix, but since use of the externalized build will be new it offers the opportunity to tweak a bit.

I think in this case when building to externalize the lexer, Node.js could then set
--shared-builtin-cjs_module_lexer=<EXTERNAL_PATH>/lexer
--shared-builtin-cjs_module_lexer/dist=<EXTERNAL_PATH>/wasm-lexer/lexer

Also note this is based on how I remeber these options working in terms of what you provide as a path, but it was a while back when I added the option so not 100% sure without some testing.

Just some thoughts, will need @guybedford to comment on wether any of these suggestions make any sense.

@mochaaP
Copy link
Contributor Author

mochaaP commented Feb 1, 2024

@mhdawson:

The current case where Node.js uses files from the top level as well as dist seems a bit odd to me

See https://github.com/nodejs/node/blob/9a25438a62186fa5bd2603536d99dcccd0acd893/lib/internal/modules/esm/translators.js#L82-L93. Loading builtins will need an exact match for the import specifier, so importing from a subdirectory is non-applicable, since there is no filesystem present.

In that context it might make sense to change where files get written when EXTERNAL_PATH is defined so that a directory called externalized-dist is created with something like this:

A problem with the externalized-dist approach is that we run npm build, then npm pack at the %build stage in Fedora, and unpack that tarball to the target path in the %install stage. Only files specified in package.json (to be exact, files, exports and main fields) will be written to the tarball, which excludes your proposed externalized-dist.

Initial support for loading unbundled module in `AddExternalizedBuiltin`.

- Reduces downstream distribution package size (by not shipping wasm twice
  and not base64-encoding it)
- Provides a cleaner stacktrace
- Easier to patch

To enable this, pass `EXTERNAL_PATH=/global/node_modules/cjs-module-lexer`
to `build.js`.
You shall also pass this path to `--shared-builtin-cjs_module_lexer/dist/lexer-path`
in Node.js's `configure.py`, with the extra `/dist` part in the path.

Signed-off-by: Zephyr Lykos <git@mochaa.ws>
@mochaaP
Copy link
Contributor Author

mochaaP commented Feb 1, 2024

Converted indentation in loadWasm.js from tabs to spaces.

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2024

See https://github.com/nodejs/node/blob/9a25438a62186fa5bd2603536d99dcccd0acd893/lib/internal/modules/esm/translators.js#L82-L93. Loading builtins will need an exact match for the import specifier, so importing from a subdirectory is non-applicable, since there is no filesystem present.

My point is that using something outside of the dist directory seems odd since if dist is meant to contain externally used assets, also using files outside of dist is non-intuative, at least to me.

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2024

A problem with the externalized-dist approach is that we run npm build, then npm pack at the %build stage in Fedora, and unpack that tarball to the target path in the %install stage. Only files specified in package.json (to be exact, files, exports and main fields) will be written to the tarball, which excludes your proposed externalized-dist.

If the approach makes sense to @guybedford then I think adding externalized-dist to the "files" section in the package.json should be acceptable. Depending on how the package was built, dist or externalized dist would be empty but I doubt the extra space of a empty directory would be objectionable. I think the main thing is if the concept of everything being used externally being a single directly or not makes sense/is intended. I look to @guybedford to comment on that.

@mochaaP
Copy link
Contributor Author

mochaaP commented Feb 2, 2024

I think it's better to keep the current directory structure in case any third-party package depends on the dist folder.

@mhdawson
Copy link
Member

mhdawson commented Feb 2, 2024

@mochaaP what I was suggesting would only change things if the package used the EXTERNAL_PATH option, would third party packages be able to find the files in the Fedora delivered install?

@mochaaP
Copy link
Contributor Author

mochaaP commented Feb 3, 2024

I think that still breaks things, especially when there weren't any obvious outcomes 🧐

@guybedford
Copy link
Collaborator

Thanks for creating the PR, can't we just make the external path the default build approach instead of bifurcating the entire build process? The base64 technique comes from es-module-shims which runs in the browser primarily hence the benefits of a single file, but since cjs-module-lexer is mostly for Node.js usage we could just use this technique by default and have a single file that reads a local Wasm file. I'd prefer if we could not separate the loadWasm behind a dynamic import, and especially an extensionless import. Let's just inline the entire routine rather?

@guybedford
Copy link
Collaborator

When importing this in Node.js itself, how is the extra Wasm file loaded from a path inside of Node.js though?

@mochaaP
Copy link
Contributor Author

mochaaP commented Feb 12, 2024

@guybedford:

can't we just make the external path the default build approach instead of bifurcating the entire build process?

Node.js loads builtins as a single script file under the internal/deps namespace (see src/node_builtins.cc). Hence, the external build approach is not suitable for embedded use in upstream, and it must be inlined as base64.

However, if us (downstream distribution) decided to unbundle the module, we could get the benefit of faster startups and smaller update sizes due to reduced libnode.so size and shave off the need for an extra base64 step.

When importing this in Node.js itself, how is the extra Wasm file loaded from a path inside of Node.js though?

When importing the module from libnode.so, it's loaded from a base64 buffer, as described above. If we opted in to use externally shared modules instead, it will resolve the module path from EXTERNAL_PATH and read it from there.

I'd prefer if we could not separate the loadWasm behind a dynamic import, and especially an extensionless import. Let's just inline the entire routine rather?

This is handled by esbuild (see the build.js change). I don't remember the exact reason for this, but probably related to tree-shaking (btw, that didn't help). Fixed in e17dd9d.

@guybedford
Copy link
Collaborator

Ideally the best way we could achieve these same goals would be by fully migrating this project as a C++ build, work which was already outlined in https://github.com/anonrig/commonjs-lexer. There would be performance benefits to be had as well that way.

Otherwise it seems this PR doesn't apply to anything other than a very custom wrapping approach - my major concern there is that even if we land this, we aren't necessarily testing that embedding itself so it seems a little fragile.

I would be happy to land the PR as-is if @mhdawson also approves. Just talking through the wider concerns, given that I can't comment that well on the particular embedding.

@mhdawson
Copy link
Member

I'll chat with @khardix on Friday, if he's good with the PR as is, I'm ok as well. I still think the structure could be better, but not enough to block progress on this.

@mhdawson
Copy link
Member

Talked to @khardix , looks like its good from his perspective. He also mentioned that something similar went into undici nodejs/undici#2643 so it is good to be consistent.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mochaaP
Copy link
Contributor Author

mochaaP commented Mar 4, 2024

@guybedford review request :)

@khardix
Copy link
Contributor

khardix commented Apr 4, 2024

Hm, and since this is approved, can anyone push the merge button? 😇

I see the failed CI, but GH no longer show me any details as to why.

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 4, 2024

This request was automatically failed because there were no enabled runners online to process the request for more than 1 days.

iirc it's a github actions outage

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 7, 2024

@guybedford could you take a look at this?

@mhdawson
Copy link
Member

mhdawson commented Apr 8, 2024

There does not seem to be any easy way to restart the action for the test.

@mhdawson
Copy link
Member

mhdawson commented Apr 8, 2024

@mochaaP I can't seem to make a change to one of the files to re-trigger the CI run. @mochaaP could you push a change, possibly adding the CR at the end of package.json?

@mochaaP mochaaP force-pushed the unbundled-external-builtins branch from f678b6c to e17dd9d Compare April 9, 2024 00:53
@khardix
Copy link
Contributor

khardix commented Apr 10, 2024

@mhdawson The workflow requires approval from maintainer to run.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 11, 2024

Oh I think the culprit here is that ubuntu-18.04 is no longer available in hosted runners.

Consider migrate to ubuntu-latest (latest LTS)?

@mochaaP mochaaP force-pushed the unbundled-external-builtins branch 2 times, most recently from e742a8d to dedfe17 Compare April 12, 2024 12:12
@mochaaP mochaaP force-pushed the unbundled-external-builtins branch from dedfe17 to 7ca131a Compare April 12, 2024 12:12
@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 12, 2024

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 12, 2024

since the workflow has changed drastically, re-requesting reviews.

@mhdawson
Copy link
Member

@mochaaP were all the workflow changes needed because of the move to unbuntu-latest ?

Needs a review from @guybedford, since he's the primary maintainer for this repo.

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 12, 2024

nope, but I think it's better done in this way :)

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 12, 2024

also node.js.yml is unused.

@guybedford guybedford merged commit 1c5072c into nodejs:main Apr 14, 2024
1 check passed
@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 14, 2024

Thanks! I would be grateful if you could publish a new release, so we don't have to rely on snapshot versions / patch it ourselves ☺️

@guybedford
Copy link
Collaborator

@mochaaP 1.3.0 has been released with this change.

@mochaaP
Copy link
Contributor Author

mochaaP commented Apr 28, 2024

cc @guybedford I need a quick patch here: afcf9f1ce4a10c0l62411e9d9e106b5c0072ef783

i somehow missed an undefined in src/lexer.js 🤒

guybedford added a commit that referenced this pull request Apr 28, 2024
guybedford added a commit that referenced this pull request Apr 28, 2024
* Revert "tests: fix loading wasm when targeting external deps (#94)"

This reverts commit bcc6ce4.

* Revert "fix: cannot resolve path when build with EXTERNAL_PATH (#93)"

This reverts commit 8ea767a.

* Revert "Support building for externally shared js builtins (#91)"

This reverts commit 1c5072c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants