Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: add support for externally shared js builtins #44376

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
38 changes: 38 additions & 0 deletions BUILDING.md
Expand Up @@ -1052,6 +1052,44 @@ To make `./myModule.js` available via `require('myModule')` and
> .\vcbuild link-module './myModule.js' link-module './myModule2.js'
```

## Building to use shared dependencies at runtime

By default Node.js is built so that all dependencies are bundled into
the Node.js binary itself. This provides a single binary that includes
the correct versions of all dependencies on which it depends.

Some Node.js distributions, however, prefer to manage dependencies.
A number of `configure` options are provided to support this use case.

* For dependencies with native code, the first set of options allow
Node.js to be built so that it uses a shared library
at runtime instead of building and including the depenency
mhdawson marked this conversation as resolved.
Show resolved Hide resolved
mhdawson marked this conversation as resolved.
Show resolved Hide resolved
in the Node.js binary itself. These options are in the
`Shared libraries` section of the `configure` help
(run `./configure --help` to get the complete list).
They provide the ability to enable the use of a shared library,
to set the name of the shared library, and to set the paths that
contain the include and shared library files.

* For dependencies with JavaScript code (including WASM), the second
set of options allow the Node.js binary to be built so that it loads
the JavaScript for dependencies at runtime instead of being built into
the Node.js binary itself. These options are in the `Shared builtins`
section of the `configure` help
(run `./configure --help` to get the complete list). They
provide the ability to set the path to an external JavaScript file
for the dependency to be used at runtime.

It is the responsibility of any distribution
shipping with these options to:

* ensure that the shared dependencies available at runtime
match what is expected by the Node.js binary. A
mismatch may result in crashes or unexpected behavior.
* fully test that Node.js operates as expected with the
external dependencies. There may be little or no test coverage
within the Node.js project CI for these non-default options.

## Note for downstream distributors of Node.js

The Node.js ecosystem is reliant on ABI compatibility within a major release.
Expand Down
31 changes: 31 additions & 0 deletions configure.py
Expand Up @@ -57,6 +57,11 @@
with open ('tools/icu/icu_versions.json') as f:
icu_versions = json.load(f)

sharable_builtins = {'cjs_module_lexer/lexer': 'deps/cjs-module-lexer/lexer.js',
targos marked this conversation as resolved.
Show resolved Hide resolved
'cjs_module_lexer/dist/lexer': 'deps/cjs-module-lexer/dist/lexer.js',
'undici/undici': 'deps/undici/undici.js'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means that distros can now provide different versions of these dependencies. Are we okay with that?

Copy link
Member Author

@mhdawson mhdawson Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this as very similar to supporting external shared libraries and it is motivated by a similar use case. Since it does not affect what we ship I think it's up to the distros that use it to keep the versions the same or compatible in what they ship just like they do for shared libraries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to carefully weigh this. I see the motivation but not quite convinced yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell can you outline your concerns in a bit more detail? We have 2 distro's asking for the functionality in order to avoid having to hack in their own solutions?

}

# create option groups
shared_optgroup = parser.add_argument_group("Shared libraries",
"Flags that allows you to control whether you want to build against "
Expand All @@ -70,6 +75,9 @@
"library you want to build against.")
http2_optgroup = parser.add_argument_group("HTTP2",
"Flags that allows you to control HTTP2 features in Node.js")
shared_builtin_optgroup = parser.add_argument_group("Shared builtins",
"Flags that allows you to control whether you want to build against "
"internal builtins or shared files.")

# Options should be in alphabetical order but keep --prefix at the top,
# that's arguably the one people will be looking for most.
Expand Down Expand Up @@ -422,6 +430,16 @@

parser.add_argument_group(shared_optgroup)

for builtin in sharable_builtins:
builtin_id = 'shared_builtin_' + builtin + '_path'
shared_builtin_optgroup.add_argument('--shared-builtin-' + builtin + '-path',
action='store',
dest='node_shared_builtin_' + builtin.replace('/', '_') + '_path',
help='Path to shared file for ' + builtin + ' builtin. '
'Will be used instead of bundled version at runtime')

parser.add_argument_group(shared_builtin_optgroup)

static_optgroup.add_argument('--static-zoslib-gyp',
action='store',
dest='static_zoslib_gyp',
Expand Down Expand Up @@ -1951,6 +1969,19 @@ def make_bin_override():
configure_inspector(output)
configure_section_file(output)

# configure sharable builtins
output['variables']['node_builtin_sharable_builtins'] = []
for builtin in sharable_builtins:
builtin_id = 'node_shared_builtin_' + builtin.replace('/', '_') + '_path'
if getattr(options, builtin_id):
if options.with_intl is 'none':
option_name = '--shared-builtin-' + builtin + '-path'
error(option_name + ' is incompatible with --with-intl=none' )
else:
output['defines'] += [builtin_id.upper() + '=' + getattr(options, builtin_id)]
else:
output['variables']['node_builtin_sharable_builtins'] += [sharable_builtins[builtin]]

# Forward OSS-Fuzz settings
output['variables']['ossfuzz'] = b(options.ossfuzz)

Expand Down
106 changes: 106 additions & 0 deletions doc/contributing/maintaining-dependencies.md
@@ -0,0 +1,106 @@
# Maintaining Dependencies

Node.js depends on additional components beyond the Node.js code
itself. These dependencies provide both native and JavaScript code
and are built together with the code under the `src` and `lib`
directories to create the Node.js binaries.

All dependencies are located within the `deps` directory.

Any code which meets one or more of these conditions should
be managed as a dependency:

* originates in an upstream project and is maintained
in that upstream project.
* is not built from the `preferred form of the work for
making modifications to it` (see
[GNU GPL v2, section 3.](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)
when `make node` is run. A good example is
WASM code generated from JavaScript(the preferred form).
targos marked this conversation as resolved.
Show resolved Hide resolved
mhdawson marked this conversation as resolved.
Show resolved Hide resolved
Typically generation is only supported on a subset of platforms, needs
additional tools, and is pre-built outside of the `make node`
step and then committed as a WASM binary in the directory
for the dependency under the `deps` directory.

By default all dependencies are bundled into the Node.js
binary, however, `configure` options should be available to
use an externalized version at runtime when:

* the dependency provides native code and is available as
a shared library in one or more of the common Node.js
distributions.
* the dependency provides JavaScript and is not built
from the `preferred form of the work for making modifications
to it` when `make node` is run.

Many distributions use externalized dependencies for one or
more of these reasons:

1. They have a requirement to build everything that they ship
from the `preferred form of the work for making
modifications to it`. This means that they need to
replace any pre-built components (for example WASM
binaries) with an equivalent that they have built.
2. They manage the dependency separately as it is used by
more applications than just Node.js. Linking against
a shared library allows them to manage updates and
CVE fixes against the library instead of having to
patch all of the individual applications.
3. They have a system wide configuration for the
dependency that all applications should respect.

## Supporting externalized dependencies with native code.

Support for externalized dependencies with native code for which a
shared library is available can added by:

* adding options to `configure.py`. These are added to the
shared\_optgroup and include an options to:
* enable use of a shared library
* set the name of the shared library
* set the path to the directory with the includes for the
shared library
* set the path to where to find the shared library at
runtime
* add a call to configure\_library() to `configure.py` for the
library at the end of list of existing configure\_library() calls.
If there are additional libraries that are required it is
possible to list more than one with the `pkgname` option.
* in `node.gypi` guard the build for the dependency
with `node_shared_depname` so that is is only built if
the dependency is being bundled into Node.js itself. For example:

```text
[ 'node_shared_brotli=="false"', {
'dependencies': [ 'deps/brotli/brotli.gyp:brotli' ],
}],
```

## Supporting externalizable dependencies with JavaScript codeIZA

Support for an externalizable dependency with JavaScript code
can be added by:

* adding an entry to the `sharable_builtins` map in
`configure.py`. The path should correspond to the file
within the deps directory that is normally bundled into
Node.js. For example `deps/cjs-module-lexer/lexer.js`.
This will add a new option for building with that dependency
externalized. After adding the entry you can see
the new option by running `./configure --help`.

* adding a call to `AddExternalizedBuiltin` to the constructor
for BuildinLoader in `src/node_builtins.cc` for the
dependency using the `NODE_SHARED_BUILTLIN` #define generated for
the dependency. After running `./configure` with the new
option you can find the #define in `config.gypi`. You can cut and
paste one of the existing entries and then update to match the
inport name for the dependency and the #define generated.

## Supporting non-externalized dependencies with JavaScript code

If the depenency consists of JavaScript in the
mhdawson marked this conversation as resolved.
Show resolved Hide resolved
mhdawson marked this conversation as resolved.
Show resolved Hide resolved
`preferred form of the work for making modifications to it`, it
can be added as a non-externalizable dependency. In this case
simply add the path to the JavaScript file in the `deps_files`
list in the `node.gyp` file.
4 changes: 1 addition & 3 deletions node.gyp
Expand Up @@ -48,9 +48,7 @@
'deps/v8/tools/tickprocessor-driver.mjs',
'deps/acorn/acorn/dist/acorn.js',
'deps/acorn/acorn-walk/dist/walk.js',
'deps/cjs-module-lexer/lexer.js',
'deps/cjs-module-lexer/dist/lexer.js',
'deps/undici/undici.js',
'<@(node_builtin_sharable_builtins)',
],
'node_mksnapshot_exec': '<(PRODUCT_DIR)/<(EXECUTABLE_PREFIX)node_mksnapshot<(EXECUTABLE_SUFFIX)',
'conditions': [
Expand Down
43 changes: 43 additions & 0 deletions src/node_builtins.cc
Expand Up @@ -32,6 +32,24 @@ BuiltinLoader BuiltinLoader::instance_;

BuiltinLoader::BuiltinLoader() : config_(GetConfig()), has_code_cache_(false) {
LoadJavaScriptSource();
#if defined(NODE_HAVE_I18N_SUPPORT)
#ifdef NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH
AddExternalizedBuiltin(
"internal/deps/cjs-module-lexer/lexer",
STRINGIFY(NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH));
#endif // NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH

#ifdef NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH
AddExternalizedBuiltin(
"internal/deps/cjs-module-lexer/dist/lexer",
STRINGIFY(NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH));
#endif // NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH

#ifdef NODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH
AddExternalizedBuiltin("internal/deps/undici/undici",
STRINGIFY(NODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH));
#endif // NODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH
#endif // NODE_HAVE_I18N_SUPPORT
}

BuiltinLoader* BuiltinLoader::GetInstance() {
Expand Down Expand Up @@ -219,6 +237,31 @@ MaybeLocal<String> BuiltinLoader::LoadBuiltinSource(Isolate* isolate,
#endif // NODE_BUILTIN_MODULES_PATH
}

#if defined(NODE_HAVE_I18N_SUPPORT)
void BuiltinLoader::AddExternalizedBuiltin(const char* id,
const char* filename) {
std::string source;
int r = ReadFileSync(&source, filename);
if (r != 0) {
fprintf(stderr,
"Cannot load externalized builtin: \"%s:%s\".\n",
id,
filename);
ABORT();
return;
}

icu::UnicodeString utf16 = icu::UnicodeString::fromUTF8(
icu::StringPiece(source.data(), source.length()));
auto source_utf16 = std::make_unique<icu::UnicodeString>(utf16);
Comment on lines +252 to +254
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs some sort of #ifdef guard for --without-intl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax I was wondering about that. If we don't have intl, do you know how typically convert to utf16 if intl is not enabled ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#37954 (comment) has some prior discussion on that (tl;dr: we don’t, we can use V8 in cases in which an Isolate is available but not here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer to the discussion, I can see that is trying to the code where I'd grabbed using icu for the conversion from :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up making it so that the --without-intl reports an error if you try to configure it along with an externally shared js builtin and then guarding the code so that if --without-intl is used without the new options it is excluded.

My take is that it is acceptable for the new options to not be availble when --without-intl. My reasoning is both that people using --without-intl is a small set of people at this point in time and that those that do are unlikely to be interested in externalizing builtins and that rolling our own icu convertion for this unlikely case is not worth the work/additional code in Node.js. If that's wrong or we add native conversion helpers at some later time we can always do the work to remove that restriction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did confirm as Anna mentioned that an isolate is not available in the method where we do the coversions so using V8 is not possible in this case.

Add(id,
UnionBytes(reinterpret_cast<const uint16_t*>((*source_utf16).getBuffer()),
utf16.length()));
// keep source bytes for builtin alive while BuiltinLoader exists
GetInstance()->externalized_source_bytes_.push_back(std::move(source_utf16));
}
#endif // NODE_HAVE_I18N_SUPPORT

// Returns Local<Function> of the compiled module if return_code_cache
// is false (we are only compiling the function).
// Otherwise return a Local<Object> containing the cache.
Expand Down
11 changes: 11 additions & 0 deletions src/node_builtins.h
Expand Up @@ -3,10 +3,14 @@

#if defined(NODE_WANT_INTERNALS) && NODE_WANT_INTERNALS

#include <list>
#include <map>
#include <memory>
#include <set>
#include <string>
#if defined(NODE_HAVE_I18N_SUPPORT)
#include <unicode/unistr.h>
#endif // NODE_HAVE_I18N_SUPPORT
#include <vector>
#include "node_mutex.h"
#include "node_union_bytes.h"
Expand Down Expand Up @@ -120,11 +124,18 @@ class NODE_EXTERN_PRIVATE BuiltinLoader {
static void HasCachedBuiltins(
const v8::FunctionCallbackInfo<v8::Value>& args);

#if defined(NODE_HAVE_I18N_SUPPORT)
static void AddExternalizedBuiltin(const char* id, const char* filename);
#endif // NODE_HAVE_I18N_SUPPORT

static BuiltinLoader instance_;
BuiltinCategories builtin_categories_;
BuiltinSourceMap source_;
BuiltinCodeCacheMap code_cache_;
UnionBytes config_;
#if defined(NODE_HAVE_I18N_SUPPORT)
std::list<std::unique_ptr<icu::UnicodeString>> externalized_source_bytes_;
#endif // NODE_HAVE_I18N_SUPPORT

// Used to synchronize access to the code cache map
Mutex code_cache_mutex_;
Expand Down