Skip to content

Commit

Permalink
cli: add --trace-atomics-wait flag
Browse files Browse the repository at this point in the history
Adds a flag that helps with debugging deadlocks due to incorrectly
implemented `Atomics.wait()` calls.

PR-URL: #33292
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
  • Loading branch information
addaleax committed May 15, 2020
1 parent d3a8a23 commit c7eeef5
Show file tree
Hide file tree
Showing 7 changed files with 178 additions and 0 deletions.
15 changes: 15 additions & 0 deletions benchmark/worker/atomics-wait.js
@@ -0,0 +1,15 @@
'use strict';
/* global SharedArrayBuffer */

const common = require('../common.js');
const bench = common.createBenchmark(main, {
n: [1e7]
});

function main({ n }) {
const i32arr = new Int32Array(new SharedArrayBuffer(4));
bench.start();
for (let i = 0; i < n; i++)
Atomics.wait(i32arr, 0, 1); // Will return immediately.
bench.end(n);
}
30 changes: 30 additions & 0 deletions doc/api/cli.md
Expand Up @@ -816,6 +816,33 @@ added: v12.0.0
Set default [`tls.DEFAULT_MIN_VERSION`][] to 'TLSv1.3'. Use to disable support
for TLSv1.2, which is not as secure as TLSv1.3.

### `--trace-atomics-wait`
<!-- YAML
added: REPLACEME
-->

Print short summaries of calls to [`Atomics.wait()`][] to stderr.
The output could look like this:

```text
(node:15701) [Thread 0] Atomics.wait(<address> + 0, 1, inf) started
(node:15701) [Thread 0] Atomics.wait(<address> + 0, 1, inf) did not wait because the values mismatched
(node:15701) [Thread 0] Atomics.wait(<address> + 0, 0, 10) started
(node:15701) [Thread 0] Atomics.wait(<address> + 0, 0, 10) timed out
(node:15701) [Thread 0] Atomics.wait(<address> + 4, 0, inf) started
(node:15701) [Thread 1] Atomics.wait(<address> + 4, -1, inf) started
(node:15701) [Thread 0] Atomics.wait(<address> + 4, 0, inf) was woken up by another thread
(node:15701) [Thread 1] Atomics.wait(<address> + 4, -1, inf) was woken up by another thread
```

The fields here correspond to:

* The thread id as given by [`worker_threads.threadId`][]
* The base address of the `SharedArrayBuffer` in question, as well as the
byte offset corresponding to the index passed to `Atomics.wait()`
* The expected value that was passed to `Atomics.wait()`
* The timeout passed to `Atomics.wait`

### `--trace-deprecation`
<!-- YAML
added: v0.8.0
Expand Down Expand Up @@ -1205,6 +1232,7 @@ Node.js options that are allowed are:
* `--tls-min-v1.1`
* `--tls-min-v1.2`
* `--tls-min-v1.3`
* `--trace-atomics-wait`
* `--trace-deprecation`
* `--trace-event-categories`
* `--trace-event-file-pattern`
Expand Down Expand Up @@ -1474,12 +1502,14 @@ $ node --max-old-space-size=1536 index.js
```

[`--openssl-config`]: #cli_openssl_config_file
[`Atomics.wait()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/wait
[`Buffer`]: buffer.html#buffer_class_buffer
[`SlowBuffer`]: buffer.html#buffer_class_slowbuffer
[`process.setUncaughtExceptionCaptureCallback()`]: process.html#process_process_setuncaughtexceptioncapturecallback_fn
[`tls.DEFAULT_MAX_VERSION`]: tls.html#tls_tls_default_max_version
[`tls.DEFAULT_MIN_VERSION`]: tls.html#tls_tls_default_min_version
[`unhandledRejection`]: process.html#process_event_unhandledrejection
[`worker_threads.threadId`]: worker_threads.html##worker_threads_worker_threadid
[Chrome DevTools Protocol]: https://chromedevtools.github.io/devtools-protocol/
[REPL]: repl.html
[ScriptCoverage]: https://chromedevtools.github.io/devtools-protocol/tot/Profiler#type-ScriptCoverage
Expand Down
4 changes: 4 additions & 0 deletions doc/node.1
Expand Up @@ -363,6 +363,10 @@ but the option is supported for compatibility with older Node.js versions.
Set default minVersion to 'TLSv1.3'. Use to disable support for TLSv1.2 in
favour of TLSv1.3, which is more secure.
.
.It Fl -trace-atomics-wait
Print short summaries of calls to
.Sy Atomics.wait() .
.
.It Fl -trace-deprecation
Print stack traces for deprecations.
.
Expand Down
45 changes: 45 additions & 0 deletions src/node.cc
Expand Up @@ -227,11 +227,56 @@ int Environment::InitializeInspector(
}
#endif // HAVE_INSPECTOR && NODE_USE_V8_PLATFORM

#define ATOMIC_WAIT_EVENTS(V) \
V(kStartWait, "started") \
V(kWokenUp, "was woken up by another thread") \
V(kTimedOut, "timed out") \
V(kTerminatedExecution, "was stopped by terminated execution") \
V(kAPIStopped, "was stopped through the embedder API") \
V(kNotEqual, "did not wait because the values mismatched") \

static void AtomicsWaitCallback(Isolate::AtomicsWaitEvent event,
Local<v8::SharedArrayBuffer> array_buffer,
size_t offset_in_bytes, int64_t value,
double timeout_in_ms,
Isolate::AtomicsWaitWakeHandle* stop_handle,
void* data) {
Environment* env = static_cast<Environment*>(data);

const char* message;
switch (event) {
#define V(key, msg) \
case Isolate::AtomicsWaitEvent::key: \
message = msg; \
break;
ATOMIC_WAIT_EVENTS(V)
#undef V
}

fprintf(stderr,
"(node:%d) [Thread %" PRIu64 "] Atomics.wait(%p + %zx, %" PRId64
", %.f) %s\n",
static_cast<int>(uv_os_getpid()),
env->thread_id(),
array_buffer->GetBackingStore()->Data(),
offset_in_bytes,
value,
timeout_in_ms,
message);
}

void Environment::InitializeDiagnostics() {
isolate_->GetHeapProfiler()->AddBuildEmbedderGraphCallback(
Environment::BuildEmbedderGraph, this);
if (options_->trace_uncaught)
isolate_->SetCaptureStackTraceForUncaughtExceptions(true);
if (options_->trace_atomics_wait) {
isolate_->SetAtomicsWaitCallback(AtomicsWaitCallback, this);
AddCleanupHook([](void* data) {
Environment* env = static_cast<Environment*>(data);
env->isolate()->SetAtomicsWaitCallback(nullptr, nullptr);
}, this);
}

#if defined HAVE_DTRACE || defined HAVE_ETW
InitDTrace(this);
Expand Down
4 changes: 4 additions & 0 deletions src/node_options.cc
Expand Up @@ -435,6 +435,10 @@ EnvironmentOptionsParser::EnvironmentOptionsParser() {
"throw an exception on deprecations",
&EnvironmentOptions::throw_deprecation,
kAllowedInEnvironment);
AddOption("--trace-atomics-wait",
"trace Atomics.wait() operations",
&EnvironmentOptions::trace_atomics_wait,
kAllowedInEnvironment);
AddOption("--trace-deprecation",
"show stack traces on deprecations",
&EnvironmentOptions::trace_deprecation,
Expand Down
1 change: 1 addition & 0 deletions src/node_options.h
Expand Up @@ -140,6 +140,7 @@ class EnvironmentOptions : public Options {
std::string redirect_warnings;
bool test_udp_no_try_send = false;
bool throw_deprecation = false;
bool trace_atomics_wait = false;
bool trace_deprecation = false;
bool trace_exit = false;
bool trace_sync_io = false;
Expand Down
79 changes: 79 additions & 0 deletions test/parallel/test-trace-atomics-wait.js
@@ -0,0 +1,79 @@
'use strict';
require('../common');
const assert = require('assert');
const child_process = require('child_process');
const { Worker } = require('worker_threads');

if (process.argv[2] === 'child') {
const i32arr = new Int32Array(new SharedArrayBuffer(8));
assert.strictEqual(Atomics.wait(i32arr, 0, 1), 'not-equal');
assert.strictEqual(Atomics.wait(i32arr, 0, 0, 10), 'timed-out');

new Worker(`
const i32arr = require('worker_threads').workerData;
Atomics.store(i32arr, 1, -1);
Atomics.notify(i32arr, 1);
Atomics.wait(i32arr, 1, -1);
`, { eval: true, workerData: i32arr });

Atomics.wait(i32arr, 1, 0);
assert.strictEqual(Atomics.load(i32arr, 1), -1);
Atomics.store(i32arr, 1, 0);
Atomics.notify(i32arr, 1);
return;
}

const proc = child_process.spawnSync(
process.execPath,
[ '--trace-atomics-wait', __filename, 'child' ],
{ encoding: 'utf8', stdio: [ 'inherit', 'inherit', 'pipe' ] });

if (proc.status !== 0) console.log(proc);
assert.strictEqual(proc.status, 0);

const SABAddress = proc.stderr.match(/Atomics\.wait\((?<SAB>.+) \+/).groups.SAB;
const actualTimeline = proc.stderr
.replace(new RegExp(SABAddress, 'g'), '<address>')
.replace(new RegExp(`\\(node:${proc.pid}\\) `, 'g'), '')
.replace(/\binf(inity)?\b/gi, 'inf')
.replace(/\r/g, '')
.trim();
console.log('+++ normalized stdout +++');
console.log(actualTimeline);
console.log('--- normalized stdout ---');

const begin =
`[Thread 0] Atomics.wait(<address> + 0, 1, inf) started
[Thread 0] Atomics.wait(<address> + 0, 1, inf) did not wait because the \
values mismatched
[Thread 0] Atomics.wait(<address> + 0, 0, 10) started
[Thread 0] Atomics.wait(<address> + 0, 0, 10) timed out`;

const expectedTimelines = [
`${begin}
[Thread 0] Atomics.wait(<address> + 4, 0, inf) started
[Thread 1] Atomics.wait(<address> + 4, -1, inf) started
[Thread 0] Atomics.wait(<address> + 4, 0, inf) was woken up by another thread
[Thread 1] Atomics.wait(<address> + 4, -1, inf) was woken up by another thread`,
`${begin}
[Thread 0] Atomics.wait(<address> + 4, 0, inf) started
[Thread 0] Atomics.wait(<address> + 4, 0, inf) was woken up by another thread
[Thread 1] Atomics.wait(<address> + 4, -1, inf) started
[Thread 1] Atomics.wait(<address> + 4, -1, inf) did not wait because the \
values mismatched`,
`${begin}
[Thread 0] Atomics.wait(<address> + 4, 0, inf) started
[Thread 1] Atomics.wait(<address> + 4, -1, inf) started
[Thread 0] Atomics.wait(<address> + 4, 0, inf) was woken up by another thread
[Thread 1] Atomics.wait(<address> + 4, -1, inf) did not wait because the \
values mismatched`,
`${begin}
[Thread 0] Atomics.wait(<address> + 4, 0, inf) started
[Thread 0] Atomics.wait(<address> + 4, 0, inf) did not wait because the \
values mismatched
[Thread 1] Atomics.wait(<address> + 4, -1, inf) started
[Thread 1] Atomics.wait(<address> + 4, -1, inf) did not wait because the \
values mismatched`
];

assert(expectedTimelines.includes(actualTimeline));

9 comments on commit c7eeef5

@Trott
Copy link
Member

@Trott Trott commented on c7eeef5 May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this commit is causing this problem that's showing up in CI?:

09:10:33 ../src/node.cc:256:10: error: 'message' may be used uninitialized in this function [-Werror=maybe-uninitialized]
09:10:33    fprintf(stderr,
09:10:33    ~~~~~~~^~~~~~~~
09:10:33            "(node:%d) [Thread %" PRIu64 "] Atomics.wait(%p + %zx, %" PRId64
09:10:33            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
09:10:33                ", %.f) %s\n",
09:10:33                ~~~~~~~~~~~~~~
09:10:33            static_cast<int>(uv_os_getpid()),
09:10:33            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
09:10:33            env->thread_id(),
09:10:33            ~~~~~~~~~~~~~~~~~
09:10:33            array_buffer->GetBackingStore()->Data(),
09:10:33            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
09:10:33            offset_in_bytes,
09:10:33            ~~~~~~~~~~~~~~~~
09:10:33            value,
09:10:33            ~~~~~~
09:10:33            timeout_in_ms,
09:10:33            ~~~~~~~~~~~~~~
09:10:33            message);
09:10:33            ~~~~~~~~
09:10:33 cc1plus: all warnings being treated as errors
09:10:33 make[2]: *** [libnode.target.mk:340: /home/iojs/build/workspace/node-test-commit-linux-containered/out/Release/obj.target/libnode/src/node.o] Error 1

@addaleax
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trott Yes, maybe? It’s a false positive, though – can you check whether adding default: UNREACHABLE(); to the switch statement helps?

@Trott

This comment was marked as resolved.

@Trott

This comment was marked as resolved.

@Trott
Copy link
Member

@Trott Trott commented on c7eeef5 May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trott Yes, maybe? It’s a false positive, though – can you check whether adding default: UNREACHABLE(); to the switch statement helps?

Current master:
https://ci.nodejs.org/job/node-test-commit-linux-containered/20250/

Adding default: UNREACHABLE();: https://ci.nodejs.org/job/node-test-commit-linux-containered/20251/

@Trott
Copy link
Member

@Trott Trott commented on c7eeef5 May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trott Yes, maybe? It’s a false positive, though – can you check whether adding default: UNREACHABLE(); to the switch statement helps?

Current master:
https://ci.nodejs.org/job/node-test-commit-linux-containered/20250/

Adding default: UNREACHABLE();: https://ci.nodejs.org/job/node-test-commit-linux-containered/20251/

That seems to have fixed it. Is that a legitimate fix/workaround that should be PR'ed?

@addaleax
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trott Hm yeah … it might be better to return; instead of UNREACHABLE();, or to actually add an “unknown event” message, to account for the fact that V8 might add new event types to that list. Currently the list is exhaustive, but the compiler doesn’t know that, and it’s kinda right to complain about it.

@Trott
Copy link
Member

@Trott Trott commented on c7eeef5 May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert(0)?

@addaleax
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#33451 :)

Please sign in to comment.