Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crash with tracing enabled #46376

Closed
dunhor opened this issue Jan 26, 2023 · 2 comments · Fixed by #46380
Closed

Node crash with tracing enabled #46376

dunhor opened this issue Jan 26, 2023 · 2 comments · Fixed by #46380

Comments

@dunhor
Copy link

dunhor commented Jan 26, 2023

Version

v18.13.0

Platform

Linux devloop-Surface-Laptop-4 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

This was originally encountered when trying to capture a trace of the TypeScript compiler running, however I can also reliably reproduce the same crash with a much simpler invocation that doesn't require setting up/installing a complex project.

$ node --trace-events-enabled /usr/lib/node_modules/npm/bin/npm-cli.js set init-license MIT
Aborted (core dumped)

How often does it reproduce? Is there a required condition?

See "additional information" below for a possible condition. From my experience, the commands that cause a crash - like the one from above - reproduce 100% of the time. That said, not all invocations with tracing enabled will reproduce the issue.

What is the expected behavior?

The node process does not crash and the trace file is written to disk in its entirety

What do you see instead?

Aborted (core dumped)

On top of that, the trace file contains incomplete JSON

Additional information

Running under the debugger, I see a crash (abort) with this stack:

Thread 2 (Thread 0x7ffff7a42640 (LWP 2148334) "node"):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348118080) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348118080) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348118080, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7a88476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7a6e7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000000000a9d38a in uv_mutex_lock (mutex=mutex@entry=0x5278de0 <mutex>) at ../deps/uv/src/unix/thread.c:352
#6  0x000000000164dfe8 in post (kind=UV__WORK_FAST_IO, q=0x5328190) at ../deps/uv/src/threadpool.c:143
#7  uv__work_submit (loop=loop@entry=0x5415bd8, w=w@entry=0x5328178, kind=kind@entry=UV__WORK_FAST_IO, work=work@entry=0x1656270 <uv__fs_work>, done=done@entry=0x1655b60 <uv__fs_done>) at ../deps/uv/src/threadpool.c:270
#8  0x000000000165953d in uv_fs_write (loop=0x5415bd8, req=0x5328028, file=<optimized out>, bufs=0x7ffff7a3e720, nbufs=<optimized out>, off=-1, cb=0xc685f0 <node::tracing::NodeTraceWriter::StartWrite(uv_buf_t)::{lambda(uv_fs_s*)#1}::_FUN(uv_fs_s*)>) at ../deps/uv/src/unix/fs.c:2191
#9  0x0000000000c687ce in node::tracing::NodeTraceWriter::WriteToFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, int) [clone .part.70] ()
#10 0x0000000000c68e2a in node::tracing::NodeTraceWriter::FlushPrivate() ()
#11 0x0000000001652906 in uv__async_io (loop=0x5415bd8, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:163
#12 0x0000000001664e44 in uv__io_poll (loop=loop@entry=0x5415bd8, timeout=<optimized out>) at ../deps/uv/src/unix/epoll.c:374
#13 0x000000000165326e in uv_run (loop=0x5415bd8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:406
#14 0x00007ffff7adab43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007ffff7b6ca00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The code is calling abort because acquiring a mutex is failing

void uv_mutex_lock(uv_mutex_t* mutex) {
  if (pthread_mutex_lock(mutex))
    abort();
}

If I look at other threads, the main thread is executing here:

Thread 1 (Thread 0x7ffff7a437c0 (LWP 2148331) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5327fe8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x5327fe8) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5327fe8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff7ad9ac1 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5327f98, cond=0x5327fc0) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x5327fc0, mutex=0x5327f98) at ./nptl/pthread_cond_wait.c:627
#5  0x0000000001661879 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:800
#6  0x0000000000c67993 in node::tracing::NodeTraceWriter::Flush(bool) ()
#7  0x0000000000c6422d in node::tracing::Agent::Flush(bool) ()
#8  0x0000000000c66c2b in node::tracing::NodeTraceBuffer::Flush() ()
#9  0x0000000001ad3222 in v8::platform::tracing::TracingController::StopTracing() ()
#10 0x0000000000c6504a in node::tracing::Agent::Disconnect(int) ()
#11 0x0000000000ab39f3 in node::DefaultProcessExitHandler(node::Environment*, int) ()
#12 0x0000000000af5083 in node::Environment::Exit(int) ()
#13 0x0000000000db0230 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#14 0x0000000000db176f in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#15 0x00000000016ef579 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#16 0x00000000016734d0 in Builtins_InterpreterEntryTrampoline ()
#17 0x00002ec5934c15b9 in ?? ()
#18 0x000006fbf30c60e9 in ?? ()
#19 0x0000000600000000 in ?? ()
#20 0x00002ec5934c1689 in ?? ()
#21 0x000006fbf30c27d9 in ?? ()
#22 0x0000000000000000 in ?? ()

My non-expert guess is that the JS code calls process.exit and the tear-down of state leads to a race condition where the mutex is trying to be used after it has been destroyed. That said, trying to repro this with a simple example that just calls process.exit(0) does not produce the same crash, so I'm guessing that there's at least some amount of unpredictability in the repro (or my assessment could be wrong).

I suppose it's also worth mentioning that I haven't observed this failure on Windows, but that's perhaps because EnterCriticalSection has void return type, along with the fact that calling EnterCriticalSection after DeleteCriticalSection doesn't necessarily cause problems because a lot of the necessary state "lives" inside of the CRITICAL_SECTION object.

@santigimeno
Copy link
Member

santigimeno commented Jan 26, 2023

The problem seems to be caused by this change: 08950fc.
The reason is that there might be pending writes in the tracing agent after we shutdown libuv. At this point the libuv threadpool doesn't exist anymore, causing the crash. (@bnoordhuis already pointed out at this possibility) /cc @theanarkh

santigimeno added a commit to santigimeno/node that referenced this issue Jan 27, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: nodejs#46376.
santigimeno added a commit to santigimeno/node that referenced this issue Jan 27, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: nodejs#46376
@theanarkh
Copy link
Contributor

Yes, i think #46380 can fix this. or we can check if the threadpool exsits before calling uv_fs_write.

nodejs-github-bot pushed a commit that referenced this issue Jan 31, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: #46376
PR-URL: #46380
Reviewed-By: theanarkh <theratliter@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
ruyadorno pushed a commit that referenced this issue Feb 1, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: #46376
PR-URL: #46380
Reviewed-By: theanarkh <theratliter@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
juanarbol pushed a commit that referenced this issue Mar 3, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: #46376
PR-URL: #46380
Reviewed-By: theanarkh <theratliter@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
juanarbol pushed a commit that referenced this issue Mar 3, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: #46376
PR-URL: #46380
Reviewed-By: theanarkh <theratliter@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
juanarbol pushed a commit that referenced this issue Mar 5, 2023
Otherwise there might be pending tracing fs writes which lead to a crash
because the libuv threadpool is already gone.

Fixes: #46376
PR-URL: #46380
Reviewed-By: theanarkh <theratliter@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants