Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Startup Improvement #27196

Closed
suresh-srinivas opened this issue Apr 12, 2019 · 18 comments
Closed

Node Startup Improvement #27196

suresh-srinivas opened this issue Apr 12, 2019 · 18 comments
Labels
performance Issues and PRs related to the performance of Node.js.

Comments

@suresh-srinivas
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Node takes around 60ms to start on a modern Skylake Server, processing ~230M instructions to execute a mostly empty function (see below). From looking at the startup timeline view (using trace-events). During this time, it loads 56 JavaScript core library files and compiles them. Much of the startup time is due to that.

"events.js"
"internal/trace_events_async_hooks.js"
"async_hooks.js"
"internal/errors.js"
"internal/validators.js"
"internal/async_hooks.js"
"internal/safe_globals.js"
"util.js"
"internal/util/inspect.js"
"internal/util.js"
"internal/util/types.js"
"internal/encoding.js"
"buffer.js"
"internal/buffer.js"
"internal/process/per_thread.js"
"internal/process/main_thread_only.js"
"internal/process/stdio.js"
"assert.js"
"internal/assert.js"
"fs.js"
"path.js"
"internal/constants.js"
"internal/fs/utils.js"
"internal/url.js"
"internal/querystring.js"
"internal/process/warning.js"
"internal/process/next_tick.js"
"internal/process/promises.js"
"internal/fixed_queue.js"
"internal/inspector_async_hook.js"
"timers.js"
"internal/linkedlist.js"
"internal/priority_queue.js"
"internal/timers.js"
"internal/modules/cjs/loader.js"
"vm.js"
"url.js"
"internal/modules/cjs/helpers.js"
"console.js"
"tty.js"
"net.js"
"stream.js"
"internal/streams/pipeline.js"
"internal/streams/end-of-stream.js"
"internal/streams/legacy.js"
"_stream_readable.js"
"internal/streams/buffer_list.js"
"internal/streams/destroy.js"
"internal/streams/state.js"
"_stream_writable.js"
"_stream_duplex.js"
"_stream_transform.js"
"_stream_passthrough.js"
"internal/net.js"
"internal/stream_base_commons.js"
"internal/tty.js"

The test code.

'use strict';

module.exports.empty = (event, context) => {
    let start = new Date().getTime();

    // nothing here

    const response = {
        statusCode: 200,
        headers: {
            'Access-Control-Allow-Origin': '*', // Required for CORS support to work
        },
        body: JSON.stringify({
            startTimestamp: start,
            endTimestamp: new Date().getTime()
        }),
    };
};

const handler = require('./handler');

handler.empty(null, null,function(err, results) {
    console.log(results);
});

The timeline view of Node Startup for the above empty program. The green parts are the compilation. There are 56 JavaScript files that are compiled.
nodestartup-original

Describe the solution you'd like
The compilation for these core libraries can be done ahead of time and cached and the entire generated code snapshot could be loaded at one shot. Multiple copies of node on the same machine could share it.

There is also some v8 GC activity that could be postponed to 100ms or later.

It would be great to get the startup to 10-15 ms in the default.

@joyeecheung @hashseed @jasnell @mcollina @addaleax

Describe alternatives you've considered
Using the codecache feature the compilation gets reduced. But each of the library compiled code gets loaded one at a time. Notice the green has reduced but each of the classes are still loaded one by one.
nodestartup-codecache

FaaS providers are also starting to keep pre-warmed instances

@suresh-srinivas
Copy link
Contributor Author

suresh-srinivas commented Apr 12, 2019

perf stat -e cycles,instructions

  Cycles Instructions
empty (default) 170,637,353 230,444,605
empty (codecache) 101,290,689 114,769,349

@richardlau
Copy link
Member

Duplicate of #17058?

@hashseed
Copy link
Member

Yup. Ultimately you want to use startup snapshot instead of code cache. That way you also won't run into GCs.

This is currently blocked by refactoring startup into a static and a dynamic (depending on environment and command line args) part.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 12, 2019

This is currently blocked by refactoring startup into a static and a dynamic (depending on environment and command line args) part.

I believe I have done most of this part, another part left would be a more organized way to bookkeep the external reference registration. We could um, copy-paste whatever external references created by existing module initializers run during bootstrap for now if we just want to do an initial implementation, but as people keep adding new FunctionTemplates etc., it's not going to be very maintainable (and if it simply crashes when the references are not properly registered without proper hints, it would be quite confusing to contributors who are not familiar with the snapshot process).

Anyway I am working on a new iteration of #17058 and so far it's safe to snapshot before pre-execution. I should be able to have a valid POC soon, the bookkeeping part may or may not block it but that would be much easier to tackle than the bootstrap refactoring.

@bnoordhuis bnoordhuis added the performance Issues and PRs related to the performance of Node.js. label Apr 12, 2019
@bnoordhuis
Copy link
Member

There might be some short-term low-hanging fruit to reap. There are a lot of intertwined dependencies that are probably not strictly necessary.

An instrumented node -e '' prints this for me. Seems kind of excessive for doing nothing.

LookupAndCompile internal/per_context/setup.js
LookupAndCompile internal/per_context/domexception.js
LookupAndCompile internal/bootstrap/primordials.js
LookupAndCompile internal/bootstrap/loaders.js
LookupAndCompile internal/bootstrap/node.js
LookupAndCompile internal/util.js
LookupAndCompile internal/errors.js
LookupAndCompile events.js
LookupAndCompile buffer.js
LookupAndCompile internal/util/types.js
LookupAndCompile internal/util/inspect.js
LookupAndCompile internal/assert.js
LookupAndCompile internal/validators.js
LookupAndCompile internal/buffer.js
LookupAndCompile internal/process/per_thread.js
LookupAndCompile internal/process/main_thread_only.js
LookupAndCompile internal/process/stdio.js
LookupAndCompile internal/async_hooks.js
LookupAndCompile internal/process/task_queues.js
LookupAndCompile internal/process/promises.js
LookupAndCompile internal/fixed_queue.js
LookupAndCompile internal/console/global.js
LookupAndCompile internal/console/constructor.js
LookupAndCompile internal/util/inspector.js
LookupAndCompile internal/url.js
LookupAndCompile internal/querystring.js
LookupAndCompile internal/constants.js
LookupAndCompile path.js
LookupAndCompile internal/encoding.js
LookupAndCompile timers.js
LookupAndCompile internal/linkedlist.js
LookupAndCompile internal/timers.js
LookupAndCompile internal/priority_queue.js
LookupAndCompile internal/process/execution.js
LookupAndCompile internal/process/warning.js
LookupAndCompile internal/main/eval_string.js
LookupAndCompile internal/bootstrap/pre_execution.js
LookupAndCompile internal/options.js
LookupAndCompile internal/modules/cjs/helpers.js
LookupAndCompile url.js
LookupAndCompile internal/idna.js
LookupAndCompile internal/inspector_async_hook.js
LookupAndCompile internal/util/debuglog.js
LookupAndCompile internal/modules/cjs/loader.js
LookupAndCompile vm.js
LookupAndCompile util.js
LookupAndCompile fs.js
LookupAndCompile internal/fs/utils.js
CompileFunction [eval]-wrapper
diff --git a/src/node_contextify.cc b/src/node_contextify.cc
index 087727d93a..1266a3d338 100644
--- a/src/node_contextify.cc
+++ b/src/node_contextify.cc
@@ -985,6 +985,10 @@ void ContextifyContext::CompileFunction(
   // Argument 2: filename
   CHECK(args[1]->IsString());
   Local<String> filename = args[1].As<String>();
+  {
+    Utf8Value filename_s(args.GetIsolate(), filename);
+    printf("%s %s\n", __func__, *filename_s);
+  }
 
   // Argument 3: line offset
   CHECK(args[2]->IsNumber());
diff --git a/src/node_native_module.cc b/src/node_native_module.cc
index 08c0ab16e3..62e332574b 100644
--- a/src/node_native_module.cc
+++ b/src/node_native_module.cc
@@ -299,6 +299,7 @@ MaybeLocal<Function> NativeModuleLoader::LookupAndCompile(
   Local<String> source = source_it->second.ToStringChecked(isolate);
 
   std::string filename_s = id + std::string(".js");
+  printf("%s %s\n", __func__, filename_s.c_str());
   Local<String> filename =
       OneByteString(isolate, filename_s.c_str(), filename_s.size());
   Local<Integer> line_offset = Integer::New(isolate, 0);

@joyeecheung
Copy link
Member

joyeecheung commented Apr 12, 2019

@bnoordhuis BTW you can get a more complete list (including the bindings loaded) with process._rawDebug(process.moduleLoadList.join('\n')).

A lot of these dependencies come from the fact that we set the browser globals by default. We could make them load lazily before snapshot is implemented and shipped, but after snapshot is in core we would want them to be included in the snapshot - I believe I've managed to clean up most of them so that they are now side-effect free and can be included in the snapshot.

@suresh-srinivas
Copy link
Contributor Author

Not everyday you get to see a 2X improvement for node -e {} Nice work @joyeecheung and other collaborators.

  nodejs v10.13.0 nodejs master nodejs v10.13.0/master
cycles 149,778,245 77,046,107 1.9
instructions 210,776,025 91,695,996 2.3

We will analyze this some more. Let us know where we can help.

@suresh-srinivas
Copy link
Contributor Author

The new trace for empty shows it is down to 21ms. Very impressive. @hashseed there is a 1ms of GC Scavenge around 15 ms. Any suggestions?

There is also a bit of sparse script_parse_and_compile bits of green. Is that loading the cached code? Would be nice to batch it and cut some more.

nodestartup-master

@jasnell
Copy link
Member

jasnell commented Apr 18, 2019

FYI... Enabling the node.vm trace event category will cause the trace event file to include the file names of the core modules so you can see exactly how long each one adds during bootstrap. Enabling the node.bootstrap category will include milestone events so you can see exactly how long between various bootstrap phases.

@suresh-srinivas
Copy link
Contributor Author

Thanks @jasnell . Yes I did trace it to include those categories, but the file names are not being included. Maybe I should report that seperately. Let me know if it works for you correctly.


~/git/node-master/node.latest --trace-event-categories v8,node,node.asy
nc_hooks,node.bootstrap,node.vm.script --trace-event-file-pattern '${pid}-${rotatio
n}.log' -e {}

It was hard to see the bootStrapComplete in the Chrome view. I looked at the logfile after it was processed with
jq '.traceEvents' 104695-1.log > 104695.processed.log

BootStrap is down to 17ms.

start ts bootstrapComplete ts bootstrap
3,204,644,094,501 3,204,644,111,624 17,123

@jasnell
Copy link
Member

jasnell commented Apr 18, 2019

Hmm.. will have to test in a bit that it still works as expected.

@suresh-srinivas
Copy link
Contributor Author

Other events for bootStrap. If I am reading this right, 1ms for nodeStart and 4ms for v8Start. We have to account for the other 12ms

nodeStart v8Start Environment v8Start - nodeStart Environment-v8Start
3,204,644,091,814 3,204,644,092,877 3,204,644,097,060 1,063 4,183

@hashseed
Copy link
Member

hashseed commented Apr 18, 2019

Do I understand correctly that these data points are from embedding code cache for internal modules?

In that case, a GC is quite understandable. Everytime we deserialize code for a module, there is going to be a quite sizeable allocation.

If we are aiming for eventually using startup snapshot, then I don't think there is any point in tweaking GC to squeeze out that 1ms. Maybe you want to play around with --initial-old-space-size though.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 18, 2019

BTW with #27147 you can capture a CPU profile with more detailed call stacks. For instance when running node -p process.versions (I've bumped the sampling interval to 10us to be more accurate)

Screen Shot 2019-04-18 at 12 49 48 PM

As you can see in the screenshot, basically every startup mode consists of three phases (modulo per-context scripts that runs too early to profile at the moment):

  1. internal/bootstrap/loader.js - the first top anonymous function (blue) in the screenshot
  2. internal/bootstrap/node.js - the second top anonymous function (green)
  3. internal/main/*.js - the third top anonymous function (pink)

(Note that actually 1 and 2 can be merged but we still keep them separate for readability)

The execution of the per-context scripts as well as 1 and 2 above will be eliminated by the snapshot integration. Then we'll need to move as much as possible from 3 to 2 while preserving the invariant the there should be no environment-dependent code in 1 or 2.

Screen Shot 2019-04-18 at 12 49 57 PM

As shown above the majority of time is spent on compiling native modules (with code cache in my branch) - this is not entirely eliminatable with the snapshot integration. At the moment to be safe we only load modules that are used during bootstrap on demand. We could load and cache more when building the snapshot, but we'll have to refactor and make sure that the modules loaded do not do anything environment-dependent (roughly means they need to be side-effect-free) during their first evaluation - this does not only apply to the JS code but also to the initializer of the bindings that they load. This issue is surprisingly common in the code base and it's a bit difficult to enforce the side-effect constraint at the moment (considering the native modules are all um...CJS)

@jasnell
Copy link
Member

jasnell commented Jun 26, 2020

The conversation here appears to have stalled out. Should this remain open?

@gireeshpunathil
Copy link
Member

closing as inactive for more than an year; pls re-open if there is still outstanding work

@joyeecheung
Copy link
Member

joyeecheung commented Oct 19, 2020

BTW this is what I currently get with --cpu-prof

Screen Shot 2020-10-20 at 2 57 48 AM

It has improved after we shipped embedded snapshot for the bootstrap (step 1 and 2 are gone compared to #27196 (comment)), the remaining work is sketched out in https://docs.google.com/document/d/15bu038I36oILq5t4Qju1sS2nKudVB6NSGWz00oD48Q8/edit though the bigger goal is to enable user land snapshot for user applications :). Not sure how far we've gone compared to what we had in the OP (since I don't have the same machine to benchmark) but I am pretty sure we've improved a lot now with code cache and the snapshot and we've probably hit the goal of reducing the number to 10-15ms

@joyeecheung
Copy link
Member

BTW the new tracking issue for snapshot integration is #35711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

7 participants