Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report: add support for Workers #31386

Closed
wants to merge 7 commits into from
Closed

Conversation

addaleax
Copy link
Member

Only the last commit is concerned with the report feature itself. The other commits are work leading up to it, making features like this easier to add in general, and related cleanup.


src: better encapsulate native immediate list

Refactor for clarity and reusability. Make it more obvious that the
list is a FIFO queue.

src: exclude C++ SetImmediate() from count

There is no real reason to manage a count manually, given that
checking whether there are C++ callbacks is a single pointer
comparison.

This makes it easier to add other kinds of native C++ callbacks
that are managed in a similar way.

src: add a threadsafe variant of SetImmediate()

Add a variant of SetImmediate() that can be called from any thread.
This allows removing the AsyncRequest abstraction and replaces it
with a more generic mechanism.

src: remove AsyncRequest

Remove AsyncRequest from the source code, and replace its
usage with threadsafe SetImmediate() calls. This has the
advantage of being able to pass in any function, rather than
one that is defined when the AsyncRequest is “installed”.

This necessitates two changes:

  • The stopping flag (which was only used in one case and ignored
    in the other) is now a direct member of the Environment class.
  • Workers no longer have their own libuv handles, requiring
    manual management of their libuv ref count.

As a drive-by fix, the can_call_into_js variable was turned
into an atomic variable. While there have been no bug reports,
the flag is set from Stop(env) calls, which are supposed to
be possible from any thread.

src: add interrupts to Environments/Workers

Allow doing what V8’s v8::Isolate::RequestInterrupt() does for V8.
This also works when there is no JS code currently executing.

src: move MemoryInfo() for worker code to .cc files

This is a) the right thing to do anyway because these functions
can not be inlined by the compiler and b) avoids compilation warnings
in the following commit.

report: add support for Workers

Include a report for each sub-Worker of the current Node.js instance.

This adds a feature that is necessary for eventually making the report
feature stable, as was discussed during the last collaborator summit.

Refs: openjs-foundation/summit#240

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@addaleax addaleax added worker Issues and PRs related to Worker support. report Issues and PRs related to process.report. labels Jan 16, 2020
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Jan 16, 2020
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@addaleax addaleax added the wip Issues and PRs that are still a work in progress. label Jan 16, 2020
@nodejs-github-bot
Copy link
Collaborator

@addaleax addaleax removed the wip Issues and PRs that are still a work in progress. label Jan 16, 2020
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@richardlau richardlau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to find time to look at the technical aspects. Left a few comments re. tests.

Also maybe bump the NODE_REPORT_VERSION?

constexpr int NODE_REPORT_VERSION = 1;

I'm a bit unsure on the criterea for bumping the version (#28121 (comment)), but this does add a new section and header value.

cc @boneskull as a tool author that consumes the report.

test/common/report.js Show resolved Hide resolved
test/common/report.js Show resolved Hide resolved
@richardlau
Copy link
Member

I'm a bit unsure on the criterea for bumping the version (#28121 (comment)), but this does add a new section and header value.

Actually reskimming the PR it looks like the support for workers is to nest reports inside the report so this probably does need a version bump.

src/node_report.cc Show resolved Hide resolved
src/node_report.cc Show resolved Hide resolved
@gireeshpunathil
Copy link
Member

@addaleax - this is awesome!

  • should we inhibit workers from taking reports? do you see any potential downside of doing that?
  • placement of the workers section: right now you are placing it after the libuv section. Given that the worker's report section is more or less identical to that of main thread, does it make sense to have the worker's sections as siblings to the main, as opposed to a component?
  • the main thread is blocked while workers are iterated and report collected, with the block duration proportional to the thread count, should we document this?
  • some process wide sections (such as the header, environment, shared libs etc.) are replicated for each worker. While this is not a problem now, I guess we could refactor GetReport that records sections that are env specific and then compose a master report by combining env specific sections and process specific ones?

@gireeshpunathil
Copy link
Member

also, I am finding it difficult to relate the other commits with add support for Workers - can you pls help a little? especially the remove AsyncRequest and its dependent commits.

@addaleax
Copy link
Member Author

I'm a bit unsure on the criterea for bumping the version (#28121 (comment)), but this does add a new section and header value.

Actually reskimming the PR it looks like the support for workers is to nest reports inside the report so this probably does need a version bump.

@richardlau Done, although I agree that there should be documentation on the circumstances under which the version is bumped – otherwise it doesn’t seem like a useful value.

@addaleax - this is awesome!

@gireeshpunathil Thanks! :)

  • should we inhibit workers from taking reports? do you see any potential downside of doing that?

I think the question should be whether there are any upsides to inhibiting workers from taking reports. I don’t see any, and I’m not a fan of arbitrary restrictions, so I’d go with “no” for now. (I.e. the “obvious” downside is that it makes the report feature less powerful.)

  • placement of the workers section: right now you are placing it after the libuv section. Given that the worker's report section is more or less identical to that of main thread, does it make sense to have the worker's sections as siblings to the main, as opposed to a component?

I don’t think so. Workers are sub-objects of their parent threads, and in particular, listing them as siblings would lose the tree structure of Workers, i.e. a Worker started by a Worker would be indistinguishable from a Worker started by the main thread in the case of listing them as siblings (unless information like the parent thread id or similar is added).

  • the main thread is blocked while workers are iterated and report collected, with the block duration proportional to the thread count, should we document this?

I’ve added a section to the report docs, PTAL 👍

  • some process wide sections (such as the header, environment, shared libs etc.) are replicated for each worker. While this is not a problem now, I guess we could refactor GetReport that records sections that are env specific and then compose a master report by combining env specific sections and process specific ones?

Yes, I’ve thought about that too. For now, duplicating the information seemed somewhat reasonable because it means that any Worker’s report itself is a fully valid report that can be looked at independently, but I also see the benefit of excluding per-process information.

(And in particular, for the environment variables, it’s not really correct to just use the “global” env vars, imo. But then again, that problem seems independent from this PR – we’re already doing this for reports generated from Workers.)

also, I am finding it difficult to relate the other commits with add support for Workers - can you pls help a little? especially the remove AsyncRequest and its dependent commits.

As the PR description says – it’s work leading up to it and related cleanup. Removing AsyncRequest doesn’t need to happen here, but it is closely related to adding the interrupt feature that the Worker reports use, so it fits in here as general cleanup.

@gireeshpunathil
Copy link
Member

re: worker taking reports - because of this

env->ForEachWorker([&](Worker* w) {

will it loose out taking report for the main thread?

@addaleax
Copy link
Member Author

re: worker taking reports - because of this

env->ForEachWorker([&](Worker* w) {

will it loose out taking report for the main thread?

Reports taken from Workers don’t include parent threads, that’s correct. I would also consider this the expected behaviour.

@jasnell
Copy link
Member

jasnell commented Jan 17, 2020

Regarding versioning, we really ought to have used semver for the versioning model on the format. Perhaps it's not too late to switch?

@cjihrig
Copy link
Contributor

cjihrig commented Jan 17, 2020

Regarding versioning, we really ought to have used semver for the versioning model on the format. Perhaps it's not too late to switch?

It was originally going to use semver, but I was asked to change it in #28121 (comment).

@addaleax
Copy link
Member Author

Let's move the versioning question to nodejs/diagnostics#349, it's definitely bigger than this PR.

@addaleax
Copy link
Member Author

Fwiw, this is weirdly breaking one specific HTTP/2 test on Windows. I'm investigating, but that means that this PR still needs at least a minor fixup.

@addaleax
Copy link
Member Author

I've pushed a commit that seems to resolve the Windows issue for me, locally (hopefully CI will confirm that). It restores behaviour that was present before the first commit in this PR, where the native SetImmediate() queue only ran callbacks that were scheduled before the queue was being run (and not during it).

The HTTP/2 implementation seems to rely on that in some way, but honestly, that seems like a bug waiting to happen (and maybe even the reason for all the Windows HTTP/2 flakiness?), so I'd like to investigate more tomorrow.

@nodejs-github-bot
Copy link
Collaborator

MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Refactor for clarity and reusability. Make it more obvious that the
list is a FIFO queue.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
There is no real reason to manage a count manually, given that
checking whether there are C++ callbacks is a single pointer
comparison.

This makes it easier to add other kinds of native C++ callbacks
that are managed in a similar way.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Add a variant of `SetImmediate()` that can be called from any thread.
This allows removing the `AsyncRequest` abstraction and replaces it
with a more generic mechanism.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Remove `AsyncRequest` from the source code, and replace its
usage with threadsafe `SetImmediate()` calls. This has the
advantage of being able to pass in any function, rather than
one that is defined when the `AsyncRequest` is “installed”.

This necessitates two changes:

- The stopping flag (which was only used in one case and ignored
  in the other) is now a direct member of the `Environment` class.
- Workers no longer have their own libuv handles, requiring
  manual management of their libuv ref count.

As a drive-by fix, the `can_call_into_js` variable was turned
into an atomic variable. While there have been no bug reports,
the flag is set from `Stop(env)` calls, which are supposed to
be possible from any thread.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Allow doing what V8’s `v8::Isolate::RequestInterrupt()` does for V8.
This also works when there is no JS code currently executing.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
This is a) the right thing to do anyway because these functions
can not be inlined by the compiler and b) avoids compilation warnings
in the following commit.

PR-URL: nodejs#31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Include a report for each sub-Worker of the current Node.js instance.

This adds a feature that is necessary for eventually making the report
feature stable, as was discussed during the last collaborator summit.

Refs: openjs-foundation/summit#240

PR-URL: nodejs#31386
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
de2c68c moved this call to
the destructor, under the assumption that that would essentially
be equivalent to running it as part of the callback since the
worker would be destroyed along with the callback.

However, the actual code in
`Environment::RunAndClearNativeImmediates()` comes with the subtlety
that testing whether a JS exception has been thrown
happens between the invocation of the callback and its destruction,
leaving a possible exception from `JoinThread()` potentially
unhandled (and unintentionally silenced through the `TryCatch`).

This affected exceptions thrown from the `'exit'` event of the
Worker, and made the `parallel/test-worker-message-type-unknown`
test flaky, as the invalid message was sometimes only received
during the Worker thread’s exit handler.

Fix this by moving the `JoinThread()` call back to where it was
before.

Refs: nodejs#31386

PR-URL: nodejs#31468
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
MylesBorins pushed a commit to addaleax/node that referenced this pull request Apr 1, 2020
Prevent mistakes like the one fixed by the previous commit
by destroying the callback immediately after it has been called.

PR-URL: nodejs#31468
Refs: nodejs#31386
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Refactor for clarity and reusability. Make it more obvious that the
list is a FIFO queue.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
There is no real reason to manage a count manually, given that
checking whether there are C++ callbacks is a single pointer
comparison.

This makes it easier to add other kinds of native C++ callbacks
that are managed in a similar way.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Add a variant of `SetImmediate()` that can be called from any thread.
This allows removing the `AsyncRequest` abstraction and replaces it
with a more generic mechanism.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Remove `AsyncRequest` from the source code, and replace its
usage with threadsafe `SetImmediate()` calls. This has the
advantage of being able to pass in any function, rather than
one that is defined when the `AsyncRequest` is “installed”.

This necessitates two changes:

- The stopping flag (which was only used in one case and ignored
  in the other) is now a direct member of the `Environment` class.
- Workers no longer have their own libuv handles, requiring
  manual management of their libuv ref count.

As a drive-by fix, the `can_call_into_js` variable was turned
into an atomic variable. While there have been no bug reports,
the flag is set from `Stop(env)` calls, which are supposed to
be possible from any thread.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Allow doing what V8’s `v8::Isolate::RequestInterrupt()` does for V8.
This also works when there is no JS code currently executing.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
This is a) the right thing to do anyway because these functions
can not be inlined by the compiler and b) avoids compilation warnings
in the following commit.

Backport-PR-URL: #32301
PR-URL: #31386
Refs: openjs-foundation/summit#240
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Include a report for each sub-Worker of the current Node.js instance.

This adds a feature that is necessary for eventually making the report
feature stable, as was discussed during the last collaborator summit.

Refs: openjs-foundation/summit#240

Backport-PR-URL: #32301
PR-URL: #31386
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
de2c68c moved this call to
the destructor, under the assumption that that would essentially
be equivalent to running it as part of the callback since the
worker would be destroyed along with the callback.

However, the actual code in
`Environment::RunAndClearNativeImmediates()` comes with the subtlety
that testing whether a JS exception has been thrown
happens between the invocation of the callback and its destruction,
leaving a possible exception from `JoinThread()` potentially
unhandled (and unintentionally silenced through the `TryCatch`).

This affected exceptions thrown from the `'exit'` event of the
Worker, and made the `parallel/test-worker-message-type-unknown`
test flaky, as the invalid message was sometimes only received
during the Worker thread’s exit handler.

Fix this by moving the `JoinThread()` call back to where it was
before.

Refs: #31386

Backport-PR-URL: #32301
PR-URL: #31468
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
MylesBorins pushed a commit that referenced this pull request Apr 1, 2020
Prevent mistakes like the one fixed by the previous commit
by destroying the callback immediately after it has been called.

Backport-PR-URL: #32301
PR-URL: #31468
Refs: #31386
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
@codebytere codebytere mentioned this pull request Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. notable-change PRs with changes that should be highlighted in changelogs. report Issues and PRs related to process.report. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants