Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exit race between main and worker threads #25007

Closed
gireeshpunathil opened this issue Dec 13, 2018 · 46 comments
Closed

exit race between main and worker threads #25007

gireeshpunathil opened this issue Dec 13, 2018 · 46 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. lib / src Issues and PRs related to general changes in the lib or src directory. process Issues and PRs related to the process subsystem.

Comments

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Dec 13, 2018

  • Version: v11.0.0
  • Platform: all
  • Subsystem: worker, process, src

Sample test case to reproduce the issue:

'use strict'
const { Worker, isMainThread, parentPort } = require('worker_threads')

if (isMainThread) {
  const count = process.argv[2] / 1
  for(var i=0;i<count;i++)
    new Worker(__filename)
  process.exit(0)
} else {
  setInterval(() => {
    parentPort.postMessage('Hello, world!')
  }, 1)
}

The flakiness of the test is largely influenced by the thread scheduling order / number of CPUs / load on the system.

First reported in AIX and Linux through sequential/test-cli-syntax.js. The more you run, the more variety of scenarios you get: SIGSEGV, SIGABRT, SIGILL... depends on at what point the main and the worker threads are.

The root cause is that there is no specified order / identified ownership of C++ global objects that being destructed between threads.

Refs: #24403

@gireeshpunathil gireeshpunathil added confirmed-bug Issues with confirmed bugs. process Issues and PRs related to the process subsystem. lib / src Issues and PRs related to general changes in the lib or src directory. worker Issues and PRs related to Worker support. labels Dec 13, 2018
@gireeshpunathil
Copy link
Member Author

/cc @nodejs/workers @nodejs/process

@gireeshpunathil
Copy link
Member Author

on a side note: why would worker module be loaded even in the absence of --experimental-worker flag? I believe the answer is that all the internal modules are loaded at bootstrap, irrespective of their requirement at runtime.

Can we change that? At least in this case, it will save a lot of resources (thread, memory) for use cases that do not require worker?

@Trott
Copy link
Member

Trott commented Dec 13, 2018

@bengl was looking at this earlier but I don't know if he has anything to add (and if he did, he'd probably put it in the other issue). Pinging him here anyway just in case...

@addaleax
Copy link
Member

why would worker module be loaded even in the absence of --experimental-worker flag?

As far as I can tell, it’s only the native binding that’s loaded unconditionally (to figure out whether we’re in a worker or not during bootstrap).

I believe the answer is that all the internal modules are loaded at bootstrap, irrespective of their requirement at runtime.

That’s not the case; test/parallel/test-bootstrap-modules.js tests this.

At least in this case, it will save a lot of resources (thread, memory) for use cases that do not require worker?

The memory overhead is probably not huge, and just loading the worker module does not spawn any threads on its own.

@gireeshpunathil
Copy link
Member Author

$ gdb ./node_g
(gdb) b pthread_create
Breakpoint 1 at 0xd2fe40
(gdb) r
Starting program: ./node_g 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x0000000000d2fe40 in pthread_create@plt ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libgcc-4.8.5-28.el7.x86_64 libstdc++-4.8.5-28.el7.x86_64
(gdb) bt
#0  0x0000000000d2fe40 in pthread_create@plt ()
#1  0x0000000000fd8acf in uv_thread_create (tid=0x3396790, 
    entry=0xe9be78 <node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*)>, arg=0x3395f60) at ../deps/uv/src/unix/thread.c:213
#2  0x0000000000e9bf1a in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start (
    this=0x3395f60) at ../src/node_platform.cc:63
#3  0x0000000000e998e8 in node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner (
    this=0x3395bf0, thread_pool_size=4) at ../src/node_platform.cc:178
#4  0x0000000000ea795f in __gnu_cxx::new_allocator<node::WorkerThreadsTaskRunner>::construct<node::WorkerThreadsTaskRunner<int&> > (this=0x7fffffffdcfd, __p=0x3395bf0)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/ext/new_allocator.h:120
#5  0x0000000000ea68c4 in std::allocator_traits<std::allocator<node::WorkerThreadsTaskRunner> >::_S_construct<node::WorkerThreadsTaskRunner<int&> >(std::allocator<node::WorkerThreadsTaskRunner>&, std::allocator_traits<std::allocator<node::WorkerThreadsTaskRunner> >::__construct_helper*, (node::WorkerThreadsTaskRunner<int&>&&)...) (__a=..., __p=0x3395bf0)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/alloc_traits.h:253
#6  0x0000000000ea54d5 in std::allocator_traits<std::allocator<node::WorkerThreadsTaskRunner> >::construct<node::WorkerThreadsTaskRunner<int&> >(std::allocator<node::WorkerThreadsTaskRunner>&, node::WorkerThreadsTaskRunner<int&>*, (node::WorkerThreadsTaskRunner<int&>&&)...) (__a=..., 
    __p=0x3395bf0) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/alloc_traits.h:399
#7  0x0000000000ea37aa in std::__shared_ptr<node::WorkerThreadsTaskRunner, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<node::WorkerThreadsTaskRunner>, int&> (
    this=0x7fffffffde30, __tag=..., __a=...)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:1124
#8  0x0000000000ea1692 in std::shared_ptr<node::WorkerThreadsTaskRunner>::shared_ptr<std::alloca---Type <return> to continue, or q <return> to quit---
tor<node::WorkerThreadsTaskRunner>, int&> (this=0x7fffffffde30, __tag=..., __a=...)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:316
#9  0x0000000000e9fe4e in std::allocate_shared<node::WorkerThreadsTaskRunner, std::allocator<node::WorkerThreadsTaskRunner>, int&> (__a=...)
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:588
#10 0x0000000000e9e7fb in std::make_shared<node::WorkerThreadsTaskRunner, int&> ()
    at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:604
#11 0x0000000000e9a18f in node::NodePlatform::NodePlatform (this=0x3395b00, 
    thread_pool_size=4, tracing_controller=0x3395920) at ../src/node_platform.cc:293
#12 0x0000000000dbbdcc in Initialize (this=0x3359680 <node::v8_platform>, thread_pool_size=4)
    at ../src/node.cc:239
#13 0x0000000000dc26d4 in node::InitializeV8Platform (thread_pool_size=4)
    at ../src/node.cc:1893
#14 0x0000000000dc2cd7 in node::Start (argc=1, argv=0x338b4d0) at ../src/node.cc:2122
#15 0x0000000001e13199 in main (argc=1, argv=0x7fffffffe048) at ../src/node_main.cc:126
(gdb) 

@addaleax - this is what I see, am I missing something?

@gireeshpunathil
Copy link
Member Author

ok, apologies; those are normal worker threads created every time at the bootup. I got confused with worker_thread's thread; my bad.

@addaleax
Copy link
Member

@gireeshpunathil Okay, that clears it up :)

I think worker_thread’s threads are affected as well, though…

@gireeshpunathil gireeshpunathil removed the worker Issues and PRs related to Worker support. label Dec 13, 2018
@gireeshpunathil
Copy link
Member Author

ok, I edited the description to remove worker module from the limelight. I was misguided by the word worker in the WorkerThreadsTaskRunner - that led to writing this test too.

So just to clarify (for myself and others): now this test case only solves the purpose to easily recreate / pronounce the issue through many worker threads, but workers are not real culprits.

@joyeecheung
Copy link
Member

As far as I can tell, it’s only the native binding that’s loaded unconditionally (to figure out whether we’re in a worker or not during bootstrap).

I believe that should be unnecessary - opened #25017

@gireeshpunathil
Copy link
Member Author

Part of debugging #24921 I happened to run the entire CI with underscored exit replacing normal exit in Environment::Exit. The only test that fails in pseudo-tty/test-set-raw-mode-reset-process-exit . I am not advocating in favor of _exit as the favored fix, just stating it here.

@gireeshpunathil
Copy link
Member Author

gireeshpunathil commented Dec 15, 2018

on another side, if we want to restrict global destructors from being invoked from helper threads (avoiding the word worker here, so as not to mistake with worker_threads) what would be a starting point? For example we could do:

if (! env->is_main_thread()) return;

leaving that action for the main thread. However, how do we identify such destructors? Do we have a static list somewhere? /cc @addaleax @joyeecheung

@addaleax
Copy link
Member

@gireeshpunathil I don’t think that’s a feasible solution… the problem occurs mostly when the main thread exits, as far as I have seen so far. Also, I don’t think there’s a safe way for us to tell from a destructor whether we are on the main thread or not (and in the context of embedders there might be multiple main threads).

As for a list of object destructions that are potentially problematic… here’s the best thing I’ve come up with so far based on looking at the symbol tables:

*Potentially* problematic symbols
$ nm node| grep '[0-9a-f]* [dDbB]'|egrep -v 'args(_[0-9]*_*)*$'|egrep -v '_ZZ?N2v8'|grep -v '_ZZN6icu_63'|grep N4node|grep -v '_ZTV'|grep -v trace_event_unique_atomic|grep -v available_category| awk '{print $3}'|c++filt|sort
guard variable for node::crypto::NewRootCertStore()::root_certs_vector
guard variable for node::crypto::NewRootCertStore()::root_certs_vector_mutex
node::(anonymous namespace)::Parser::settings
node::(anonymous namespace)::Parser::settings
node::cares_wrap::(anonymous namespace)::ares_library_mutex
node::crypto::extra_root_certs_loaded
node::crypto::Initialize(v8::Local<v8::Object>, v8::Local<v8::Value>, v8::Local<v8::Context>, void*)::init_once
node::crypto::NewRootCertStore()::root_certs_vector
node::crypto::NewRootCertStore()::root_certs_vector_mutex
node::crypto::NodeBIO::GetMethod()::method
node::crypto::root_certs
node::crypto::root_cert_store
node::debug_symbols_generated
node::Environment::kNodeContextTagPtr
node::Environment::Start(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool)::init_once
node::Environment::thread_local_env
node::environ_mutex
node::fs::kPathSeparator
node::http2::Headers::Headers(v8::Isolate*, v8::Local<v8::Context>, v8::Local<v8::Array>)::zero
node::http2::Http2Session::callback_struct_saved
node::http2::Origins::Origins(v8::Isolate*, v8::Local<v8::Context>, v8::Local<v8::String>, unsigned long)::zero
node::http_parser_version
node::init_modpending_once
node::inspector::(anonymous namespace)::start_io_thread_async
node::inspector::(anonymous namespace)::start_io_thread_semaphore
node::inspector::protocol::NodeTracing::TraceConfig::RecordModeEnum::RecordAsMuchAsPossible
node::inspector::protocol::NodeTracing::TraceConfig::RecordModeEnum::RecordContinuously
node::inspector::protocol::NodeTracing::TraceConfig::RecordModeEnum::RecordUntilFull
node::inspector::protocol::StringUtil::kNotFound
node::linux_at_secure
node::llhttp_version
node::loader::EXTENSIONS
node::modlist_addon
node::modlist_builtin
node::modlist_internal
node::modlist_linked
node::node_is_initialized
node::node_isolate
node::node_isolate_mutex
node::options_parser::DebugOptionsParser::instance
node::options_parser::EnvironmentOptionsParser::instance
node::options_parser::PerIsolateOptionsParser::instance
node::options_parser::PerProcessOptionsParser::instance
node::performance::performance_node_start
node::performance::performance_v8_start
node::performance::timeOrigin
node::performance::timeOriginTimestamp
node::per_process_loader
node::per_process::metadata
node::per_process_opts
node::per_process_opts_mutex
node::process_mutex
node::prog_start_time
node::PromiseRejectCallback(v8::PromiseRejectMessage)::rejectionsHandledAfter
node::PromiseRejectCallback(v8::PromiseRejectMessage)::unhandledRejections
node::provider_names
node::reverted
node::SigintWatchdogHelper::instance
node::thread_local_modpending
node::tracing::g_agent
node::url::(anonymous namespace)::hex
node::v8_initialized
node::v8_is_profiling
node::v8_platform
node::worker::(anonymous namespace)::next_thread_id
node::worker::(anonymous namespace)::next_thread_id_mutex

@gireeshpunathil
Copy link
Member Author

thanks @addaleax. this is a pretty huge list!

so what viable options exist for us in your opinion? At the moment it loos like many tests are being affected, and though started with harmless-looking tests in AIX, there is evidence that it has infected some Linux variants.

Also just wondering what recent changes would have triggered this. The current AIX CI which I work with seem to have the best recreate frequency, so do you advise a bisect route? It is a laborious thing to do and probably can be inconclusive, but we might get some vital hints.

@addaleax
Copy link
Member

@gireeshpunathil Most of the things in these list are not even C++ classes but rather primitive types, so those could be omitted…

I don’t have a clue, as for what might have caused this, or how to bisect this best. And in the worst case, it might just be a timing-based race condition that we caused through some performance changes. :/

@addaleax
Copy link
Member

Okay, this one looks like a real bug that could be part of this:

==12846== Possible data race during write of size 8 at 0x61E64E0 by thread #1
==12846== Locks held: 1, at address 0x61FD960
==12846==    at 0x7EB227: node::tracing::TracingController::~TracingController() (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x82F11F: node::tracing::Agent::~Agent() (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x74E74C: node::._233::~._233() (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x5DEF040: __run_exit_handlers (exit.c:108)
==12846==    by 0x5DEF139: exit (exit.c:139)
==12846==    by 0x728F40: node::Environment::Exit(int) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x74E880: node::Exit(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x9C8E0C: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x9CA157: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x1658B2D: ??? (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x17F39368816D: ???
==12846==    by 0x17F39368816D: ???
==12846==
==12846== This conflicts with a previous read of size 8 by thread #11
==12846== Locks held: 2, at addresses 0x649A618 0x649B9A0
==12846==    at 0x709BC5: node::AsyncWrap::AsyncReset(double, bool) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x72EC0A: node::HandleWrap::HandleWrap(node::Environment*, v8::Local<v8::Object>, uv_handle_s*, node::AsyncWrap::ProviderType) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x7C7A3D: node::worker::MessagePort::MessagePort(node::Environment*, v8::Local<v8::Context>, v8::Local<v8::Object>) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x7C7BCE: node::worker::MessagePort::New(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x9C8682: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x9CAA6C: v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::HeapObject>) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0xCEE0BF: v8::internal::Execution::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x95D723: v8::Function::NewInstanceWithSideEffectType(v8::Local<v8::Context>, int, v8::Local<v8::Value>*, v8::SideEffectType) const (in /home/sqrt/src/node/master/out/Release/node)
==12846==  Address 0x61e64e0 is 0 bytes inside a block of size 96 alloc'd
==12846==    at 0x4C3184A: operator new(unsigned long) (vg_replace_malloc.c:334)
==12846==    by 0x82EB06: node::tracing::Agent::Agent() (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x754AF7: node::InitializeV8Platform(int) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x7561D5: node::Start(int, char**) (in /home/sqrt/src/node/master/out/Release/node)
==12846==    by 0x5DCDB96: (below main) (libc-start.c:310)
==12846==  Block was alloc'd by thread #1
==12846==

We probably should at least keep v8_platform alive while other threads are running…?

@addaleax
Copy link
Member

addaleax commented Dec 15, 2018

There’s also at least one other possible race condition in the platform code, which I think it has been caused by e273abc, but which I don’t quite understand:

==13124== Possible data race during read of size 1 at 0x1FFEFFF7F0 by thread #1
==13124== Locks held: none
==13124==    at 0x4C33ECD: my_memcmp (hg_intercepts.c:211)
==13124==    by 0x4C341B8: mutex_destroy_WRK (hg_intercepts.c:850)
==13124==    by 0x4C382D1: pthread_mutex_destroy (hg_intercepts.c:873)
==13124==    by 0x8D4CC8: uv_mutex_destroy (thread.c:279)
==13124==    by 0x7E99C7: node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) (in /home/sqrt/src/node/master/out/Release/node)
==13124==    by 0x7E9BC1: node::NodePlatform::NodePlatform(int, node::tracing::TracingController*) (in /home/sqrt/src/node/master/out/Release/node)
==13124==    by 0x754BD3: node::InitializeV8Platform(int) (in /home/sqrt/src/node/master/out/Release/node)
==13124==    by 0x7561D5: node::Start(int, char**) (in /home/sqrt/src/node/master/out/Release/node)
==13124==    by 0x5DCDB96: (below main) (libc-start.c:310)
==13124== 
==13124== This conflicts with a previous write of size 4 by thread #4
==13124== Locks held: none
==13124==    at 0x5B9E192: __lll_unlock_wake (lowlevellock.S:365)
==13124==    by 0x5B987DE: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:54)
==13124==    by 0x5B987DE: pthread_mutex_unlock (pthread_mutex_unlock.c:345)
==13124==    by 0x4C349B3: mutex_unlock_WRK (hg_intercepts.c:1097)
==13124==    by 0x4C382FD: pthread_mutex_unlock (hg_intercepts.c:1115)
==13124==    by 0x8D4D38: uv_mutex_unlock (thread.c:305)
==13124==    by 0x7E7016: node::(anonymous namespace)::PlatformWorkerThread(void*) (in /home/sqrt/src/node/master/out/Release/node)
==13124==    by 0x4C36FF7: mythread_wrapper (hg_intercepts.c:389)
==13124==    by 0x5B946DA: start_thread (pthread_create.c:463)
==13124==  Address 0x1ffefff7f0 is on thread #1's stack
==13124==  in frame #4, created by node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) (???:)

addaleax added a commit to addaleax/node that referenced this issue Dec 15, 2018
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: nodejs#24403
Refs: nodejs#25007
@gireeshpunathil
Copy link
Member Author

thanks @addaleax - I just reverted the changes pertinent to e273abc and it did not fail in 100+ runs!

I will let it run for more time tonight and share the result tomorrow.

@addaleax
Copy link
Member

@gireeshpunathil I just opened #25061 to take up the __run_exit_handlers() problematic again…

I just reverted the changes pertinent to e273abc and it did not fail in 100+ runs!

I’ve stared at this code for an hour now, becoming somewhat convinced that it’s correct… this seems like bad news? Does “not fail in 100+ runs” mean that it’s likely to cause these issues here? :/

/cc @ofrobots

@gireeshpunathil
Copy link
Member Author

@addaleax - yes, 500+ runs now, and no failure. So the said commit is very likely to have caused the issues.

@addaleax
Copy link
Member

@gireeshpunathil 9f7e3a4 also changed a lot of the code introduced in e273abc … do you think you can tell if the former might be the cause rather than the latter?

That would be nice, because it's just cleanup that we could revert without issues (although it would of course be nice to understand this first)

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Jan 23, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
nodejs#25007 and resolution
identified in nodejs#25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: nodejs#25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
addaleax pushed a commit that referenced this issue Jan 23, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
@gireeshpunathil
Copy link
Member Author

almost all of the tests that had manifested exit-race issues are back to work, so if those are all happy for the next one week or so in regular CI, we should be good with this issue.

@Trott Trott unpinned this issue Jan 24, 2019
gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Jan 29, 2019
Execute many module loads in worker in a loop
while exiting from the main thread at arbitrary
execution points, and make sure that the workers
quiesce without crashing.

`worker_threads` are not necessarily the subject of
testing, those are used for easy simulation of
multi-thread scenarios.

Refs: nodejs#25007
PR-URL: nodejs#25083
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this issue Jan 29, 2019
Execute many module loads in worker in a loop
while exiting from the main thread at arbitrary
execution points, and make sure that the workers
quiesce without crashing.

`worker_threads` are not necessarily the subject of
testing, those are used for easy simulation of
multi-thread scenarios.

Refs: #25007
PR-URL: #25083
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
@gireeshpunathil
Copy link
Member Author

this pattern is not observed anymore in CI, thanks to #25061 . So good to close.

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Feb 8, 2019
Execute JS code in worker through same vm context
while exiting from the main thread at arbitrary
execution points, and make sure that the workers
quiesce without crashing.

`worker_threads` are not necessarily the subject of
testing, those are used for easy simulation of
multi-thread scenarios.

Refs: nodejs#25007

PR-URL: nodejs#25085
Reviewed-By: Anna Henningsen <anna@addaleax.net>
addaleax pushed a commit that referenced this issue Feb 8, 2019
Execute JS code in worker through same vm context
while exiting from the main thread at arbitrary
execution points, and make sure that the workers
quiesce without crashing.

`worker_threads` are not necessarily the subject of
testing, those are used for easy simulation of
multi-thread scenarios.

Refs: #25007

PR-URL: #25085
Reviewed-By: Anna Henningsen <anna@addaleax.net>
BethGriggs pushed a commit that referenced this issue Feb 13, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

Backport-PR-URL: #26048
PR-URL: #25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BethGriggs pushed a commit to MylesBorins/node that referenced this issue Mar 27, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: nodejs#24403
Refs: nodejs#25007

PR-URL: nodejs#25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BethGriggs pushed a commit that referenced this issue Mar 28, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

Backport-PR-URL: #26048
PR-URL: #25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BethGriggs pushed a commit that referenced this issue Apr 17, 2019
sequential/test-inspector-debug-end and
parallel/test-child-process-execfile

Off late these have been failing in AIX. Debugging core dump
suggested that this is a side effect of exit-race that is
described in #25007
Mart these  as flaky in AIX until that is resolved.

Refs: #25047
Refs: #25029

PR-URL: #25126
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 28, 2019
sequential/test-inspector-debug-end and
parallel/test-child-process-execfile

Off late these have been failing in AIX. Debugging core dump
suggested that this is a side effect of exit-race that is
described in #25007
Mart these  as flaky in AIX until that is resolved.

Refs: #25047
Refs: #25029

PR-URL: #25126
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 29, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue May 10, 2019
sequential/test-inspector-debug-end and
parallel/test-child-process-execfile

Off late these have been failing in AIX. Debugging core dump
suggested that this is a side effect of exit-race that is
described in #25007
Mart these  as flaky in AIX until that is resolved.

Refs: #25047
Refs: #25029

PR-URL: #25126
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue May 10, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
MylesBorins pushed a commit that referenced this issue May 16, 2019
sequential/test-inspector-debug-end and
parallel/test-child-process-execfile

Off late these have been failing in AIX. Debugging core dump
suggested that this is a side effect of exit-race that is
described in #25007
Mart these  as flaky in AIX until that is resolved.

Refs: #25047
Refs: #25029

PR-URL: #25126
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
MylesBorins pushed a commit that referenced this issue May 16, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. lib / src Issues and PRs related to general changes in the lib or src directory. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants