lib: add diagnostics_channel events to module loading #44340

Qard · 2022-08-22T11:57:11Z

Adding module.cjs.load and module.esm.load diagnostics_channel events so APMs don't need to patch the module system to know when a particular module has been loaded.

nodejs-github-bot · 2022-08-22T11:57:15Z

Review requested:

@nodejs/modules

lib/internal/modules/cjs/loader.js

bmeck · 2022-08-22T12:44:28Z

I'd be more comfortable not sending something that will be directly patched as it won't work for ESM

…

On Mon, Aug 22, 2022, 7:39 AM Stephen Belanger ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In lib/internal/modules/cjs/loader.js <#44340 (comment)>: > @@ -797,6 +800,7 @@ Module._load = function(request, parent, isMain) { throw new ERR_UNKNOWN_BUILTIN_MODULE(request); } + onModuleLoad.publish(module); The hasSubscribers check is only relevant if you're building input objects. If you just pass stuff in as-is it will be a no-op when there's no subscribers. Do we want to match anyway though just for consistency? — Reply to this email directly, view it on GitHub <#44340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZJI2NJF3NWVW462NJJ3TV2NYJNANCNFSM57HMHFCQ> . You are receiving this because you are on a team that was mentioned.Message ID: ***@***.***>

GeoffreyBooth · 2022-08-22T15:52:35Z

Since the ESM loader can also load CommonJS, should the event perhaps just be "module load" without the specific of whether it's ESM or CommonJS? I assume there's a payload with details like what module was loaded (?) so that could be where we specify which loader was used, which format the file was recognized as, etc.

GeoffreyBooth · 2022-08-22T15:58:16Z

test/parallel/test-diagnostics-channel-module-esm-load.js

+  seen = module;
+}));
+
+import('http').then(http => {


Can you write a test in ESM (in an .mjs file) to confirm that works too? What about static import statements?

Not sure how that would work. Wouldn't the module graph already be resolved/loaded by the time any code runs to actually register a subscriber? Maybe it could work in a loader?

That's why I asked 😄

We should add a test that shows expected esm behavior.

simon-id · 2022-08-23T09:47:07Z

Don't forget to add it to the docs here: https://nodejs.org/api/diagnostics_channel.html#built-in-channels

rochdev · 2022-08-23T21:33:53Z

Could more events be added? There are use cases where having just one event is not enough, for example capturing how long each require call took which would require 2 events surrounding the function. Also including more data than just the module would be helpful, for example the path to the entry point, whether it's a core module, the base path, etc. Basically most of what require-in-the-middle has been doing historically, except the version since that's specific to package managers. Basically I think there could be 3 events: 1 when require is called, 1 when it finishes, and 1 when the module is actually loaded with module resolution information included as well.

trevnorris · 2022-08-25T21:31:16Z

@Qard Thanks for doing this. I just want to let you know that this would cover all the use cases for the data we've been collecting internally.

trevnorris · 2022-09-27T20:27:47Z

@Qard Would you mind fixing the lint error so all the tests can pass?

Qard · 2022-09-27T21:24:10Z

There's some more changes I've been meaning to make. I'll try to get to that after NodeConf EU. Just had a very busy few weeks. 😅

GeoffreyBooth · 2022-09-28T00:28:00Z

There’s some more changes I’ve been meaning to make. I’ll try to get to that after NodeConf EU. Just had a very busy few weeks. 😅

Thanks. Let’s also please not land this until all open questions are resolved.

Qard · 2022-10-18T03:48:53Z

Going to redo this on #44943 after it lands.

GeoffreyBooth · 2022-10-18T17:55:17Z

Going to redo this on #44943 after it lands.

Thanks. Will that other PR enable things like I mentioned in #44340 (comment), where we could have more generic events like “module load” and then additional properties for extra data like module specifier, module system (CommonJS or ESM), resolved path, etc.?

Qard · 2022-12-15T06:24:22Z

I've reimplemented this on top of #44943. Ignore the first commit, that's inherited from that PR. I'll rebase after that lands to fix this PR. :)

GeoffreyBooth · 2022-12-15T06:30:47Z

FYI there's a lot of refactoring in progress for module loading, including starting to merge together the CJS and ESM loaders. I would wait until the dust settles on that work before landing this. I would also suggest you make the events less tied to subsystem: like module resolution, module loading; whether it was from require or import, etc.

Qard · 2022-12-15T06:49:49Z

That's fine. Rebasing this before TracingChannel lands mainly just to prove its functionality and to gather feedback on this pattern to verify it works for the APM case.

As for splitting it up into resolution and loading, there's not really much value apart from tracking the overall load. Probably best to keep it simple for now. We can always add more events later if anyone sees specific value in it. I would like to add an extra field to the context object though to indicate if it was a fresh load or a cache hit though.

Also, I split it into separate tracing channels for import and require because it's likely there will be users that will care about one and not the other. It's much more performant to subscribe to multiple sets of channels as needed rather than to try and filter out events you don't care about.

lib/internal/modules/esm/loader.js

Flarna · 2022-12-20T11:31:22Z

doc/api/diagnostics_channel.md

+another async task is triggered internally which fails and then the sync part
+of the function then throws and error two `error` events will be emitted, one
+for the sync error and one for the async error.
+
 ### Built-in Channels


The new channels should be added here

Yep, will do when we finalize this. Need to wait for the ESM changes @GeoffreyBooth mentioned first. Thanks for the reminder though. :)

Which changes in particular?

The refactoring/merging stuff you mentioned. Not in a huge rush for this PR, it can wait for those changes.

doc/api/diagnostics_channel.md

GeoffreyBooth · 2023-01-26T01:18:30Z

Adding module.cjs.load and module.esm.load diagnostics_channel events

In #44710 we’re starting the process toward merging together the CommonJS and ESM loaders. Can these events please be renamed to just load? Or maybe them reflect the source, like module.require.load / require.import.load; or separately track the resolve and load steps?

aduh95 · 2023-04-22T10:27:24Z

Lots of conflicts here.

GeoffreyBooth

I’m going to put my foot down and ask that the event names not include cjs or esm. The two loaders already share code, and will overlap more and more over time. It will become a constraint in the future to know whether to fire a diagnostics event for cjs or esm when such an event is on a merged codepath.

I think ideally the event names would map to the steps like resolve and load, and have a metadata object with additional info like what the parent module was, whether it was via require or import, etc.

Qard · 2023-04-22T16:38:29Z

Categorization type data like that should be part of the event name, not the message. The whole point of diagnostics_channel is to be able to filter down to the specific types of messages you want using the channel name. If we remove all that differentiation from the name then we just have a firehose that we need to always filter back down on the other end. I disagree that should be removed, though I think maybe it would be better expressed as things loaded with import and things loaded with require rather than cjs/esm which is more a reference to the syntax than how it was loaded. 🤔

GeoffreyBooth · 2023-04-22T16:50:50Z

maybe it would be better expressed as things loaded with import and things loaded with require rather than cjs/esm which is more a reference to the syntax than how it was loaded.

Yes, that’s more future proof, as the syntax used in user source code is unrelated to how we implement the module loader internals. You could also distinguish between static and dynamic import if that’s helpful in analytics. I guess the question is, what module-related events do users want to be aware of, and why? Understanding the use cases should hopefully make it clearer what the names and filtering should be. My only concern was that I don’t want the names of these channels to constrain us in future work on the module loaders, and basing the names around user source code would achieve that.

Just an idea, would analytics users care if a module is loaded as a result of first-party application code or from a dependency? Like maybe they want to filter out events triggered by stuff under node_modules to only focus on the code that they wrote? This is what I mean by wanting to understand the use cases.

Qard · 2023-04-22T18:33:46Z

The use case this is specifically designed for is APMs wanting to measure all loaded modules, the time they took to load, and ideally structured in a way that we can reconstruct the dependency graph from the load information. So knowing the parent, or parents, of a given module load. Having the unresolved and resolved strings could be valuable but we don't really care when those happen so we probably do need separate events for those, we can just include that data in the load events. It would also be important to differentiate cache hits from actual from-disk loads. Basically, we want to be able to compute startup time impact of different modules and map out dependencies for things like vulnerability warnings having the context of how you're transitively depending on a vulnerable module.

My plan is to update this now that TracingChannel has landed and use this in a way that would enable APMs to produce a complete startup trace as we do with request traces to show spans for each loaded module and the timing of when the load occurs.

As for loading of user code vs node_modules, if we have a handy flag already available we can pass that along, otherwise we can just continue parsing the module path to identify the source. We already do that plus some extra stuff to differentiate built-in modules too.

lib/diagnostics_channel.js

lib/internal/modules/esm/loader.js

GeoffreyBooth · 2023-12-07T05:29:53Z

lib/internal/modules/cjs/loader.js

@@ -151,6 +152,10 @@ const {
 } = require('internal/util/types');

 const { kEvaluated } = internalBinding('module_wrap');
+
+const { tracingChannel } = require('diagnostics_channel');
+const onLoad = tracingChannel('module.require');


Suggested change

const onLoad = tracingChannel('module.require');

const onRequire = tracingChannel('module.require');

GeoffreyBooth · 2023-12-07T05:30:24Z

lib/internal/modules/cjs/loader.js

@@ -1231,7 +1236,11 @@ Module.prototype.require = function(id) {
  }
  requireDepth++;
  try {
-    return Module._load(id, this, /* isMain */ false);
+    return onLoad.traceSync(Module._load, {


Suggested change

return onLoad.traceSync(Module._load, {

return onRequire.traceSync(Module._load, {

GeoffreyBooth · 2023-12-07T05:31:20Z

lib/internal/modules/cjs/loader.js

+      __proto__: null,
+      parentFilename: this.filename,
+      id,
+    }, Module, id, this, /* isMain */ false);


What are the performance implications of this? It seems like even if an app isn’t using Diagnostics Channel, its function still wraps every module load. Surely this must slow things down slightly?

GeoffreyBooth · 2023-12-07T05:31:34Z

lib/internal/modules/esm/loader.js

@@ -28,6 +28,9 @@ const {
 } = require('internal/modules/esm/utils');
 let defaultResolve, defaultLoad, defaultLoadSync, importMetaInitializer;

+const { tracingChannel } = require('diagnostics_channel');
+const onLoad = tracingChannel('module.import');


Suggested change

const onLoad = tracingChannel('module.import');

const onImport = tracingChannel('module.import');

GeoffreyBooth · 2023-12-07T05:31:54Z

lib/internal/modules/esm/loader.js

-      this, url, undefined, evalInstance, false, false);
-    this.loadCache.set(url, undefined, job);
-    const { module } = await job.run();
+    const namespace = await onLoad.tracePromise(async () => {


Suggested change

const namespace = await onLoad.tracePromise(async () => {

const namespace = await onImport.tracePromise(async () => {

GeoffreyBooth · 2023-12-07T05:32:52Z

lib/internal/modules/esm/loader.js

-    const moduleJob = await this.getModuleJob(specifier, parentURL, importAttributes);
-    const { module } = await moduleJob.run();
-    return module.getNamespace();
+    return onLoad.tracePromise(async () => {


Suggested change

return onLoad.tracePromise(async () => {

return onImport.tracePromise(async () => {

GeoffreyBooth · 2023-12-07T05:34:05Z

lib/internal/modules/esm/loader.js

+      __proto__: null,
+      parentURL: '<eval>',
+      url,
+    });


So this is capturing ESM code that comes from --eval or STDIN? Shouldn’t we do the same for CommonJS string input?

Maybe? I'm not sure we actually need this here. I'm still working on this and figuring out testing and some changes that probably need to be made.

GeoffreyBooth · 2023-12-07T05:35:22Z

test/fixtures/source-map/output/source_map_enclosing_function.snapshot

-
+node:diagnostics_channel:315
+        throw err;
+        ^


I think the previous stack trace is much more useful. I see that you’re importing hideStackFrames but it appears unused, perhaps it can help preserve the old stack trace?

GeoffreyBooth · 2023-12-07T05:36:23Z

test/parallel/test-diagnostics-channel-module-import-error.js

@@ -0,0 +1,65 @@
+'use strict';


Lately we’ve been trying to write new tests in ESM using the test runner. Would you mind updating these?

That's not how diagnostics_channel tests have been written though. That wouldn't match with the rest of the diagnostics_channel tests.

I don’t think that’s a strong argument for continuing with the old pattern. Eventually it’ll be more important that things work correctly in ESM than in CommonJS, so we avoid creating tech debt for ourselves by writing tests in ESM now.

And if there are existing Diagnostics Channel tests in CommonJS, all the more reason to have new ones in ESM since we already have coverage for the “in CommonJS” case.

Eventually it’ll be more important

that's a pretty bold claim. if it's important to have full coverage, then tests should be authored in both (or evaluated in both), rather than excluding one for the other's sake.

I also find the new test runner-style tests much easier to read, and we can take advantage of improvements it offers like the snapshot tool. Especially with regard to some of these tests that are asserting stack traces, those are ideal things to use the snapshot tool for so that future PRs don’t need to update unrelated tests every time stack traces change.

I think the best way forward would be to do a separate PR to refactor the existing tests that get modified in this one, to migrate them to the new style and snapshot tool, so that then this PR doesn’t need to edit “unrelated” tests to add Diagnostics Channel to the CommonJS loader. There are several contributors helping migrate the out/message tests into the new style who might be interested in helping if you want to open an issue listing the tests that need migrating.

GeoffreyBooth · 2023-12-07T05:38:03Z

test/parallel/test-domain-set-uncaught-exception-capture-after-load.js

@@ -2,6 +2,8 @@
 const common = require('../common');
 const assert = require('assert');

+Error.stackTraceLimit = Infinity;


Could you make this 0 or 1 so that we exclude the CommonJS loader internals from the stack trace? Or filter them out using one of the snapshot helpers? I want to be able to refactor those without causing unrelated tests to fail.

No. The trace functions appear in the middle of the stack trace somewhere. Any changes to things which appear in the stack trace will require updating the corresponding output pattern as it's fairly strict about count and ordering. It'd probably be better if the tests were more generalized, but I feel like that's beyond the scope of this PR.

Then please use the new test runner and its snapshot ability to remove the stack trace lines that are irrelevant. See https://github.com/nodejs/node/blob/fc102f2180dfc8e1654ce672185f5bf97ec21b53/test/parallel/test-node-output-console.mjs and other test files that import common/assertSnapshot.js.

Again, this is not a test I wrote or have the context to rewrite correctly. I've edited it just to make this case pass. If you think that test should work differently feel free to rewrite it.

Ah, I was assuming that this corresponded with one of the snapshots that changed in your PR, where the CommonJS loader internals are leaking. If it doesn’t, then sure I don’t care about this. In general I don’t want more CommonJS stack traces being asserted as part of tests; I don’t want to repeat PRs like #44197.

GeoffreyBooth · 2023-12-11T06:10:29Z

test/parallel/test-repl.js

+      /^ {4}at .*/, // at Channel.runStores
+      /^ {4}at .*/, // at TracingChannel.traceSync
+      /^ {4}at .*/, // at Module.require
+      /^ {4}at .*/, // at require


I’m glad the function names aren’t in the trace being asserted, but the number of stack frames is, which means that if I add so much as an extra function call to this path within the CommonJS loader then I break this test (just as you did, adding the diagnostics channel). I understand that this test was fragile before, but rather than ignoring the problem (by just increasing the number of frames) it would be better to just solve it, to assert less so that CommonJS loader refactors don’t break this test.

Agreed, but I think it's beyond the scope of this PR to define new testing patterns for stack traces. Unless you have a concrete solution to provide I don't think I'll be able to do anything about it here.

more work to do based on feedback and failing tests

nodejs-github-bot added the needs-ci PRs that need a full CI run. label Aug 22, 2022

simon-id reviewed Aug 22, 2022

View reviewed changes

lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved

GeoffreyBooth reviewed Aug 22, 2022

View reviewed changes

jasnell previously approved these changes Aug 28, 2022

View reviewed changes

rochdev mentioned this pull request Sep 21, 2022

fs plugin support DataDog/dd-trace-js#2395

Open

Trott force-pushed the main branch from 2d76238 to ca3ed36 Compare November 12, 2022 01:49

Qard force-pushed the diagnostics-channel-for-module-loads branch 5 times, most recently from 92affe4 to 99f0481 Compare December 15, 2022 06:16

Qard force-pushed the diagnostics-channel-for-module-loads branch from 99f0481 to c565e0e Compare December 15, 2022 06:32

Qard mentioned this pull request Dec 16, 2022

diagnostics_channel: add tracing channel #44943

Merged

mcollina reviewed Dec 17, 2022

View reviewed changes

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved

Flarna reviewed Dec 20, 2022

View reviewed changes

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved

Flarna reviewed Dec 20, 2022

View reviewed changes

tlhunter reviewed Jan 26, 2023

View reviewed changes

doc/api/diagnostics_channel.md Outdated Show resolved Hide resolved

GeoffreyBooth requested changes Apr 22, 2023

View reviewed changes

ljharb reviewed Apr 23, 2023

View reviewed changes

lib/diagnostics_channel.js Outdated Show resolved Hide resolved

lib/diagnostics_channel.js Outdated Show resolved Hide resolved

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved

Qard mentioned this pull request May 6, 2023

ESM loaders need a way to list exports in load() #47888

Closed

Qard force-pushed the diagnostics-channel-for-module-loads branch from c565e0e to fb35626 Compare May 6, 2023 22:50

lib: add diagnostics_channel events to module loading

9595bfb

Qard force-pushed the diagnostics-channel-for-module-loads branch from fb35626 to 9595bfb Compare December 5, 2023 12:11

Qard requested a review from GeoffreyBooth December 5, 2023 12:11

GeoffreyBooth reviewed Dec 7, 2023

View reviewed changes

GeoffreyBooth reviewed Dec 11, 2023

View reviewed changes

Qard added the wip Issues and PRs that are still a work in progress. label Dec 11, 2023

Qard mentioned this pull request Mar 26, 2024

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219

Open

	const onLoad = tracingChannel('module.require');
	const onRequire = tracingChannel('module.require');

	return onLoad.traceSync(Module._load, {
	return onRequire.traceSync(Module._load, {

	const onLoad = tracingChannel('module.import');
	const onImport = tracingChannel('module.import');

	const namespace = await onLoad.tracePromise(async () => {
	const namespace = await onImport.tracePromise(async () => {

	return onLoad.tracePromise(async () => {
	return onImport.tracePromise(async () => {

lib: add diagnostics_channel events to module loading #44340

Are you sure you want to change the base?

lib: add diagnostics_channel events to module loading #44340

Conversation

Qard commented Aug 22, 2022

nodejs-github-bot commented Aug 22, 2022

bmeck commented Aug 22, 2022 via email

GeoffreyBooth commented Aug 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simon-id commented Aug 23, 2022

rochdev commented Aug 23, 2022

trevnorris commented Aug 25, 2022

trevnorris commented Sep 27, 2022

Qard commented Sep 27, 2022

GeoffreyBooth commented Sep 28, 2022

Qard commented Oct 18, 2022 • edited

GeoffreyBooth commented Oct 18, 2022

Qard commented Dec 15, 2022

GeoffreyBooth commented Dec 15, 2022

Qard commented Dec 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeoffreyBooth commented Jan 26, 2023

aduh95 commented Apr 22, 2023

GeoffreyBooth left a comment

Choose a reason for hiding this comment

Qard commented Apr 22, 2023

GeoffreyBooth commented Apr 22, 2023 • edited

Qard commented Apr 22, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeoffreyBooth Dec 11, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qard commented Oct 18, 2022 •

edited

GeoffreyBooth commented Apr 22, 2023 •

edited

Qard commented Apr 22, 2023 •

edited

GeoffreyBooth Dec 11, 2023 •

edited