Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: register external references for source code #47055

Merged
merged 2 commits into from May 10, 2023

Conversation

kvakil
Copy link
Contributor

@kvakil kvakil commented Mar 13, 2023

Currently we use external strings for internalized builtin source code. However when a snapshot is taken, any external string whose resource is not registered is flattened into a SeqString (see ref). The result is that any module source code stored in the snapshot does not use external strings after deserialization. This patch registers an external string resource for each internalized builtin's source. The savings are substantial: ~1.9 MB of heap memory per isolate, or ~44% of an otherwise empty isolate's heap usage:

$ node --expose-gc -p 'gc(),process.memoryUsage().heapUsed'
4190968
$ ./node --expose-gc -p 'gc(),process.memoryUsage().heapUsed'
2327536

The savings can be even higher for user snapshots which may include more internal modules.

Doing this with the existing UnionBytes abstraction was tricky, because UnionBytes only creates an external string resource when ToStringChecked is called. We change UnionBytes to no longer own the data, and shift ownership of the data (in the case where the data is not static) to a new external resource class. We can then register any resource whose data is static, since we know those are precisely the internalized builtins.

Refs: https://github.com/v8/v8/blob/d2c8fbe9ccd1a6ce5591bb7dd319c3c00d6bf489/src/snapshot/serializer.cc#L633

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 13, 2023
@kvakil kvakil added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 13, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 13, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added feature request Issues that request new features to be added to Node.js. request-ci Add this label to start a Jenkins CI on a PR. and removed feature request Issues that request new features to be added to Node.js. labels Mar 13, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 13, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

src/node_builtins.h Outdated Show resolved Hide resolved
src/node_builtins.cc Outdated Show resolved Hide resolved
@kvakil
Copy link
Contributor Author

kvakil commented Mar 15, 2023

OK, PTAL. The big change is that we now only have one map by moving the complexity into UnionBytes following @joyeecheung's advice. UnionBytes has a resource_ which does the actual "owning". The registration is done by a new generated RegisterExternalReferencesForInternalizedBuiltinCode, so we don't need to do any runtime branching on whether the resources in the sources_ map are internalized or externalized builtins.

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much nicer, thanks, a couple of remaining suggestions, but over all looks good

src/node_builtins.cc Outdated Show resolved Hide resolved
src/node_union_bytes.h Outdated Show resolved Hide resolved
src/util.cc Outdated Show resolved Hide resolved
test/cctest/test_per_process.cc Outdated Show resolved Hide resolved
test/cctest/test_per_process.cc Outdated Show resolved Hide resolved
@kvakil
Copy link
Contributor Author

kvakil commented Mar 15, 2023

Thanks for the comments! They should all be addressed now.

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@nodejs-github-bot
Copy link
Collaborator

@kvakil kvakil added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 17, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 17, 2023
@nodejs-github-bot
Copy link
Collaborator

@kvakil kvakil added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 10, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 10, 2023
@nodejs-github-bot
Copy link
Collaborator

@kvakil kvakil added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 10, 2023
@kvakil kvakil added the request-ci Add this label to start a Jenkins CI on a PR. label May 9, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 9, 2023
@nodejs-github-bot
Copy link
Collaborator

@kvakil
Copy link
Contributor Author

kvakil commented May 9, 2023

Can someone please re-run the failing Jenkins build for me? I want to get this merged in, it's been languishing for a while

@nodejs-github-bot
Copy link
Collaborator

@legendecas
Copy link
Member

Sorry for the delay. Adding to my watch list to land.

@legendecas legendecas added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels May 10, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label May 10, 2023
@nodejs-github-bot nodejs-github-bot merged commit 0736d0b into nodejs:main May 10, 2023
64 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 0736d0b

targos pushed a commit that referenced this pull request May 12, 2023
Currently we use external strings for internalized builtin source code.
However when a snapshot is taken, any external string whose resource
is not registered is flattened into a SeqString (see ref). The
result is that module source code stored in the snapshot does not
use external strings after deserialization. This patch registers an
external string resource for each internalized builtin's source. The
savings are substantial: ~1.9 MB of heap memory per isolate, or ~44%
of an otherwise empty isolate's heap usage:

```console
$ node --expose-gc -p 'gc(),process.memoryUsage().heapUsed'
4190968
$ ./node --expose-gc -p 'gc(),process.memoryUsage().heapUsed'
2327536
```

The savings can be even higher for user snapshots which may include
more internal modules.

The existing UnionBytes implementation was ill-suited, because it only
created an external string resource when ToStringChecked was called,
but we need to register the external string resources before the
isolate even exists. We change UnionBytes to no longer own the data,
and shift ownership of the data to a new external resource class
called StaticExternalByteResource.  StaticExternalByteResource are
either statically allocated (for internalized builtin code) or owned
by the static `externalized_builtin_sources` map, so they will only be
destructed when static resources are destructed. We change JS2C to emit
statements to register a string resource for each internalized builtin.

Refs: https://github.com/v8/v8/blob/d2c8fbe9ccd1a6ce5591bb7dd319c3c00d6bf489/src/snapshot/serializer.cc#L633
PR-URL: #47055
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
}
// OK to get the raw pointer, since externalized_builtin_sources owns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvakil Are you sure that v8::String::ExternalStringResourceBase doesn't expect to own the allocation? I have another PR that is affected by this and in my case v8::String::ExternalStringResourceBase::Dispose calls free on the received string when destroying the isolate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmomtchev this is V8's documentation/default implementation of v8::String::ExternalStringResourceBase::Dispose:

/**
 * Internally V8 will call this Dispose method when the external string
 * resource is no longer needed. The default implementation will use the
 * delete operator. This method can be overridden in subclasses to
 * control how allocated external string resources are disposed.
 */
virtual void Dispose() { delete this; }

Since this diff has StaticExternalByteResource override Dispose to be empty, we never execute delete this:

void Dispose() override {
  // We ignore Dispose calls from V8, even if we "own" a resource via
  // owning_ptr_. All instantiations of this class are static or owned by a
  // static map, and will be destructed when static variables are destructed.
}

so I believe this is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. snapshot Issues and PRs related to the startup snapshot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants