[v10.x] src: dispose of V8 platform in `process.exit()` #26048

MylesBorins · 2019-02-11T17:27:57Z

Calling process.exit() calls the C exit() function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

PR-URL: #25061
Reviewed-By: Gireesh Punathil gpunathi@in.ibm.com
Reviewed-By: Joyee Cheung joyeec9h3@gmail.com
Reviewed-By: Ben Noordhuis info@bnoordhuis.nl

nodejs-github-bot · 2019-02-11T17:28:00Z

@MylesBorins build started: https://ci.nodejs.org/blue/organizations/jenkins/node-test-pull-request-lite-pipeline/detail/node-test-pull-request-lite-pipeline/2571/pipeline

MylesBorins · 2019-02-11T18:18:02Z

CI: https://ci.nodejs.org/job/node-test-pull-request/20732/

BethGriggs · 2019-02-12T01:30:42Z

Resume CI: https://ci.nodejs.org/job/node-test-pull-request/20737/console [Green]

Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: #24403 Refs: #25007 Backport-PR-URL: #26048 PR-URL: #25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

BethGriggs · 2019-02-13T21:07:24Z

Landed in 3c3f350

MylesBorins · 2019-02-19T16:51:11Z

This commit breaks some repos for v10.x. For example gulp-util

> gulp-util@3.0.8 test /Users/mborins/code/gulp-util
> jshint *.js lib/*.js test/*.js && mocha

sh: line 1: 23190 Abort trap: 6           jshint *.js lib/*.js test/*.js
npm ERR! Test failed.  See above for more details.

We are going to back it out of the v10.15.2 proposal and we can run CITGM against this PR and figure out what is going on. FWIW there were no breakages for citgm with this change on 11.x, so it is possible that there are some other required changes.

@gireeshpunathil I know that you wanted this to land on 10.x to fix some other related issues, perhaps you want to dig in to try and get this back in the 10.15.2 proposal

Needs new reviews as this PR broke 10.x

BethGriggs

-1 until CITGM errors are resolved

addaleax · 2019-02-19T19:54:47Z

@MylesBorins Do you think you could create a core dump/stack trace for the crash you are seeing? I can’t reproduce it locally.

MylesBorins · 2019-02-20T03:22:15Z

@addaleax I'll dig back in a bit later this week. What command should I run with to get the core dump? 😅

gireeshpunathil · 2019-02-20T04:36:00Z

@MylesBorins - also can you tell which platform the issue is seen?

addaleax · 2019-02-20T10:37:43Z

@MylesBorins Err, I’m not sure how to do that reliably on macOS. Running ulimit -c unlimited before the crashing command could help, but I’m not sure where the file would end up. (If you get a core dump, the exact node executable that was used to generated it would be helpful for investigating.)

gireeshpunathil · 2019-02-21T14:51:35Z

Unable to recreate locally.
@BethGriggs had a core dump which she gave me, but lldb does not recognize it.
Suggested a couple of more options in expectation of getting a good core.

For any one else interested, looks like it is an easy recreate on mac os mojave with this step (node v10.15.2-rc.0): npm install gulp-util

gireeshpunathil · 2019-02-22T11:08:50Z

(lldb) bt
* thread #9, stop reason = signal SIGSTOP
  * frame #0: 0x00000001006f3528 node`v8::internal::Parser::ParseFunctionLiteral(v8::internal::AstRawString const*, v8::internal::Scanner::Location, v8::internal::FunctionNameValidity, v8::internal::FunctionKind, int, v8::internal::FunctionLiteral::FunctionType, v8::internal::LanguageMode, v8::internal::ZoneList<v8::internal::AstRawString const*>*, bool*) + 216
    frame #1: 0x00000001006f49b3 node`v8::internal::Parser::DoParseFunction(v8::internal::ParseInfo*, v8::internal::AstRawString const*) + 1347
    frame #2: 0x00000001006f4230 node`v8::internal::Parser::ParseFunction(v8::internal::Isolate*, v8::internal::ParseInfo*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) + 720
    frame #3: 0x000000010071a4e0 node`v8::internal::parsing::ParseFunction(v8::internal::ParseInfo*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Isolate*) + 832
    frame #4: 0x00000001002cd54c node`v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag) + 924
    frame #5: 0x00000001002cdafd node`v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag) + 173
    frame #6: 0x00000001007afe94 node`v8::internal::Runtime_CompileLazy(int, v8::internal::Object**, v8::internal::Isolate*) + 132
    frame #7: 0x00001c866d25be3d
    frame #8: 0x00001c866d21232c
    frame #9: 0x00001c866d2118d5
    frame #10: 0x00001c866d2118d5
...
    frame #20: 0x000000010009e410 node`node::worker::MessagePort::OnMessage() + 320
    frame #21: 0x00000001009a789e node`uv__async_io + 317
    frame #22: 0x00000001009b7295 node`uv__io_poll + 1934
    frame #23: 0x00000001009a7d11 node`uv_run + 315
    frame #24: 0x00000001000d16f1 node`node::worker::Worker::Run() + 1467
    frame #25: 0x00007fff7d1b9661 libsystem_pthread.dylib`_pthread_body + 340
    frame #26: 0x00007fff7d1b950d libsystem_pthread.dylib`_pthread_start + 377
    frame #27: 0x00007fff7d1b8bf9 libsystem_pthread.dylib`thread_start + 13

(lldb) di -s 0x1006f3510 -c 20
node`v8::internal::Parser::ParseFunctionLiteral:
    0x1006f3510 <+192>: movb   $0x48, %r8b
    0x1006f3513 <+195>: movl   $0x0, -0x30(%rbp)
    0x1006f351a <+202>: leaq   0x106a96b(%rip), %rcx     ; v8::internal::FLAG_runtime_stats
    0x1006f3521 <+209>: testq  %rdi, %rdi
    0x1006f3524 <+212>: je     0x1006f3530               ; <+224>
    0x1006f3526 <+214>: movl   (%rcx), %ecx
->  0x1006f3528 <+216>: testl  %ecx, %ecx

(lldb) register read rcx
     rcx = 0x0000000000000000
(lldb)

RCX is supposed to hold v8::internal::FLAG_runtime_stats

node/deps/v8/src/parsing/parser.cc

Line 2591 in b7bbd87

if (V8_UNLIKELY(FLAG_runtime_stats) && did_preparse_successfully) {

(lldb) thread select 1
* thread #1, stop reason = signal SIGSTOP
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff7cff1d82 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff7d1bd824 libsystem_pthread.dylib`_pthread_join + 626
    frame #2: 0x00000001009b2b68 node`uv_thread_join + 14
    frame #3: 0x00000001000d1c1a node`node::worker::Worker::JoinThread() + 38
    frame #4: 0x0000000100029067 node`node::Environment::stop_sub_worker_contexts() + 89
    frame #5: 0x0000000100029001 node`node::Environment::Exit(int) + 43
    frame #6: 0x000000010023667f node`v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) + 623

ok, got one crash with this version. Though this (SIGSEGV , unhandled crash) is different from what @BethGriggs reported (SIGABRT, controlled crash), chances are that the root cause may be same?

here, the main thread has cleaned up the execution environment and is waiting for the workers to stop, but one of the worker does not seem to have stopped. I don't know if there is an associated PR that applies here. @addaleax will know!

addaleax · 2019-02-22T11:15:13Z

RCX is supposed to hold v8::internal::FLAG_runtime_stats

That’s a static global variable, I’m not sure how accessing it could fail? This looks more like execution was stopped at some random point due to the signal?

here, the main thread has cleaned up the execution environment and is waiting for the workers to stop, but one of the worker does not seem to have stopped.

What is the Worker thread doing at this point? Or is that the first stack trace in your comment?

I don't know if there is an associated PR that applies here.

Not that I know of, at least without knowing why the Worker hasn’t stopped.

gireeshpunathil · 2019-02-22T11:57:17Z

@addaleax - yes, the first stack trace is for the worker.
rcx - let me look at it again!

gireeshpunathil · 2019-02-22T12:57:01Z

I did 2 mistakes last time: i) used the master source ii) bluntly trusted the lldb's annotations in the right hand side.
But however, given this sequence, the rcx definitely is coming from static area (known to the compiler, computed as a known offset from the current instruction) , and rcx is null, and it is being dereferenced.

    0x1006f351a <+202>: leaq   0x106a96b(%rip), %rcx     ; v8::internal::FLAG_runtime_stats
    0x1006f3521 <+209>: testq  %rdi, %rdi
    0x1006f3524 <+212>: je     0x1006f3530               ; <+224>
    0x1006f3526 <+214>: movl   (%rcx), %ecx
->  0x1006f3528 <+216>: testl  %ecx, %ecx

moving on, I examined several dumps, and get different patterns, but one thing in common is: worker threads are running while main thread is waiting.

Could it be that isolate->TerminateExecution() was not called as part of the worker cleanup? But that does not explain the originally reported issue (no separate isolates exist in that test).

gireeshpunathil · 2019-02-22T13:46:53Z

Able to get the original crash! (gulp-util, no worker in the picture)

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff5864223e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff586f8c1c libsystem_pthread.dylib`pthread_kill + 285
    frame #2: 0x00007fff585ab1c9 libsystem_c.dylib`abort + 127
    frame #3: 0x00000001009b28ea node`uv_mutex_lock + 20
    frame #4: 0x00000001000b4cf4 node`node::NodePlatform::ForIsolate(v8::Isolate*) + 32
    frame #5: 0x00000001000b4d9b node`node::NodePlatform::CallOnForegroundThread(v8::Isolate*, v8::Task*) + 33
    frame #6: 0x000000010058e7a1 node`v8::internal::IncrementalMarkingJob::Task::RunInternal() + 369
    frame #7: 0x00000001000b4ac8 node`node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task> >) + 142
    frame #8: 0x00000001000b4472 node`node::PerIsolatePlatformData::FlushForegroundTasksInternal() + 630
    frame #9: 0x00000001000b4668 node`node::PerIsolatePlatformData::Shutdown() + 24
    frame #10: 0x00000001000b4618 node`node::PerIsolatePlatformData::~PerIsolatePlatformData() + 24

...

    frame #14: 0x00000001000b4a11 node`node::NodePlatform::Shutdown() + 43
    frame #15: 0x000000010003b332 node`node::$_0::Dispose() + 18
    frame #16: 0x0000000100029006 node`node::Environment::Exit(int) + 48
    frame #17: 0x000000010023667f node`v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) + 623
    frame #18: 0x0000000100235bc1 node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous 

...

    frame #44: 0x0000000100042d8a node`node::Start(uv_loop_s*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) + 326
    frame #45: 0x00000001000428b2 node`node::Start(int, char**) + 711
    frame #46: 0x0000000100001034 node`start + 52

there are only 2 threads, the other thread being inspector on uv_sem_wait.

The mutex does not have a destructor so goes down only with NodePlatform object, so not sure what is happening here!

addaleax · 2019-02-22T13:53:35Z

@gireeshpunathil Looks like the issue is that we’re trying to recursively lock per_isolate_mutex_, once from NodePlatform::Shutdown() and once from NodePlatform::ForIsolate?

This reminds me of bafd808. We removed the FlushForegroundTasksInternal() call entirely and replaced it with checks that should make sure that there are no tasks to run, but maybe older versions of V8 behave differently?

I’m wondering whether it would be okay to just remove that call, not replacing it with checks.

gireeshpunathil · 2019-02-22T14:12:36Z

wow! that matches.

but further progression from FlushForegroundTasksInternal and presence of v8::internal::IncrementalMarkingJob::Task::RunInternal in the stack indicates that there ARE pending / scheduled tasks right?

But on this very edge of termination, making sure gc activities are fully completed etc. does not seem to be required, so makes sense to me!

BethGriggs · 2019-02-22T17:12:00Z

I have done a test build on MacOS and removing the FlushForegroundTasksInternal() call fixes the crash seen with gulp-utils. Thanks @addaleax, @gireeshpunathil!

MylesBorins · 2019-03-21T15:02:14Z

Did this end up getting landed?

BethGriggs · 2019-03-21T16:36:10Z

@MylesBorins, no. On merge, this either needs to remove the FlushForegroundTasksInternal() at bafd808#diff-5b8e0acae193b6f40922fb5cda94eec8L268, or should we raise a separate PR to remove that line on v10.x and make sure both backports land together?

MylesBorins · 2019-03-21T18:23:17Z

@BethGriggs do you want to push those chnages to this pr?

Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: nodejs#24403 Refs: nodejs#25007 PR-URL: nodejs#25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

Node first calls `Isolate::Dispose`, then `NodePlatform::UnregisterIsolate`. This again calls `PerIsolatePlatformData::Shutdown`, which (before this patch) called `FlushForegroundTasksInternal`, which might call `RunForegroundTask` if it finds foreground tasks to be executed. This will fail however, since `Isolate::GetCurrent` was already reset during `Isolate::Dispose`. Hence remove the check to `FlushForegroundTasksInternal` and add checks instead that no more foreground tasks are scheduled. Refs: v8#86 PR-URL: nodejs#25653 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

nodejs-github-bot · 2019-03-27T15:15:59Z

CI: https://ci.nodejs.org/job/node-test-pull-request/21962/

BethGriggs · 2019-03-27T15:19:57Z

@MylesBorins, I cherry-picked the commit we need and rebased the PR

Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: #24403 Refs: #25007 Backport-PR-URL: #26048 PR-URL: #25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

Node first calls `Isolate::Dispose`, then `NodePlatform::UnregisterIsolate`. This again calls `PerIsolatePlatformData::Shutdown`, which (before this patch) called `FlushForegroundTasksInternal`, which might call `RunForegroundTask` if it finds foreground tasks to be executed. This will fail however, since `Isolate::GetCurrent` was already reset during `Isolate::Dispose`. Hence remove the check to `FlushForegroundTasksInternal` and add checks instead that no more foreground tasks are scheduled. Refs: v8#86 Backport-PR-URL: #26048 PR-URL: #25653 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

BethGriggs · 2019-03-28T13:52:29Z

Landed on v10.x-staging

MylesBorins requested review from gireeshpunathil and BethGriggs February 11, 2019 17:27

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. v10.x labels Feb 11, 2019

MylesBorins requested a review from addaleax February 11, 2019 17:28

MylesBorins mentioned this pull request Feb 11, 2019

src: dispose of V8 platform in process.exit() #25061

Closed

2 tasks

BethGriggs previously approved these changes Feb 11, 2019

View reviewed changes

gireeshpunathil previously approved these changes Feb 12, 2019

View reviewed changes

BethGriggs closed this Feb 13, 2019

MylesBorins reopened this Feb 19, 2019

BethGriggs requested changes Feb 19, 2019

View reviewed changes

BethGriggs force-pushed the v10.x-staging branch from 13f9356 to 5711238 Compare February 20, 2019 16:06

rvagg force-pushed the v10.x-staging branch from 5711238 to 156e4c8 Compare February 28, 2019 12:38

addaleax and others added 2 commits March 27, 2019 11:37

BethGriggs force-pushed the backport-25061 branch from 24f8e28 to 49674f7 Compare March 27, 2019 15:14

BethGriggs self-requested a review March 27, 2019 15:20

BethGriggs approved these changes Mar 27, 2019

View reviewed changes

BethGriggs requested a review from gireeshpunathil March 27, 2019 15:21

gireeshpunathil approved these changes Mar 27, 2019

View reviewed changes

BethGriggs closed this Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v10.x] src: dispose of V8 platform in `process.exit()` #26048

[v10.x] src: dispose of V8 platform in `process.exit()` #26048

MylesBorins commented Feb 11, 2019

nodejs-github-bot commented Feb 11, 2019

MylesBorins commented Feb 11, 2019

BethGriggs commented Feb 12, 2019 •

edited

BethGriggs commented Feb 13, 2019

MylesBorins commented Feb 19, 2019

BethGriggs left a comment

addaleax commented Feb 19, 2019

MylesBorins commented Feb 20, 2019

gireeshpunathil commented Feb 20, 2019 •

edited

addaleax commented Feb 20, 2019 •

edited

gireeshpunathil commented Feb 21, 2019

gireeshpunathil commented Feb 22, 2019

addaleax commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

addaleax commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

BethGriggs commented Feb 22, 2019

MylesBorins commented Mar 21, 2019

BethGriggs commented Mar 21, 2019

MylesBorins commented Mar 21, 2019

nodejs-github-bot commented Mar 27, 2019

BethGriggs commented Mar 27, 2019

BethGriggs commented Mar 28, 2019

[v10.x] src: dispose of V8 platform in process.exit() #26048

[v10.x] src: dispose of V8 platform in process.exit() #26048

Conversation

MylesBorins commented Feb 11, 2019

nodejs-github-bot commented Feb 11, 2019

MylesBorins commented Feb 11, 2019

BethGriggs commented Feb 12, 2019 • edited

BethGriggs commented Feb 13, 2019

MylesBorins commented Feb 19, 2019

BethGriggs left a comment

Choose a reason for hiding this comment

addaleax commented Feb 19, 2019

MylesBorins commented Feb 20, 2019

gireeshpunathil commented Feb 20, 2019 • edited

addaleax commented Feb 20, 2019 • edited

gireeshpunathil commented Feb 21, 2019

gireeshpunathil commented Feb 22, 2019

addaleax commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

addaleax commented Feb 22, 2019

gireeshpunathil commented Feb 22, 2019

BethGriggs commented Feb 22, 2019

MylesBorins commented Mar 21, 2019

BethGriggs commented Mar 21, 2019

MylesBorins commented Mar 21, 2019

nodejs-github-bot commented Mar 27, 2019

BethGriggs commented Mar 27, 2019

BethGriggs commented Mar 28, 2019

[v10.x] src: dispose of V8 platform in `process.exit()` #26048

[v10.x] src: dispose of V8 platform in `process.exit()` #26048

BethGriggs commented Feb 12, 2019 •

edited

gireeshpunathil commented Feb 20, 2019 •

edited

addaleax commented Feb 20, 2019 •

edited