Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on v12.19.0 #35620

Closed
jambagigirish opened this issue Oct 12, 2020 · 18 comments
Closed

Segfault on v12.19.0 #35620

jambagigirish opened this issue Oct 12, 2020 · 18 comments
Labels
addons Issues and PRs related to native addons. c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. node-api Issues and PRs related to the Node-API.

Comments

@jambagigirish
Copy link

jambagigirish commented Oct 12, 2020

  • Version:v12.19.0
  • Platform: RHEL
  • Subsystem:

What steps will reproduce the bug?

Using node js and running the component test with cucumber-js

How often does it reproduce? Is there a required condition?

After Upgrading to v12.19.0 - issue has been consistent

What is the expected behavior?

What do you see instead?

/home/user1/GIT/cp1/r1/r1/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2c61)[0x7f9e881fbc61]
/lib64/libpthread.so.0(+0xf6d0)[0x7f9ea0eb06d0]
node[0x9cf96e]
node(_ZN2v88internal13GlobalHandles40InvokeSecondPassPhantomCallbacksFromTaskEv+0x19b)[0xd17c4b]
node(_ZN2v88internal14CancelableTask3RunEv+0x23)[0xc953e3]
node(_ZN4node22PerIsolatePlatformData17RunForegroundTaskESt10unique_ptrIN2v84TaskESt14default_deleteIS3_EE+0xc4)[0xa86d54]
node(_ZN4node22PerIsolatePlatformData28FlushForegroundTasksInternalEv+0x165)[0xa87a15]
node[0x136f0ae]
node[0x1382165]
node(uv_run+0x11f)[0x136f8ef]
node(_ZN4node16NodeMainInstance3RunEv+0x1f6)[0xa5aac6]
node(_ZN4node5StartEiPPc+0x2ac)[0x9e85cc]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9ea0af6445]
node[0x9819b5]

Additional information

I am using native code build with node-gyp

@addaleax
Copy link
Member

I am using native code build with node-gyp

Can you share that native code?

@jambagigirish
Copy link
Author

jambagigirish commented Oct 13, 2020

this is one of the cpp files where nodejs sends data to convert/map to c++ byte array to send to sockets

static void Activate(const FunctionCallbackInfo<Value>& args);

///// CODE
void ActivationInit(v8::Local<v8::Object> &exports, HMODULE handle1)
{
    NODE_SET_METHOD(exports, "Activate", Activate);
}

Isolate* isolate = args.GetIsolate();
  unsigned char szUser[256];


  if (args.Length() != 4)
  {
    HandleWrongParams(isolate);
    return;
  }
  try
  {
    unsigned long param1 = getParam1(args[0], isolate);
    unsigned long param2 = getParam2(args[1], isolate);
    unsigned long param3 = getParam3(args[2], isolate);
    memset(szUser, 0, 256);
    getParamUString(args[3], szUser, 256, isolate);

    BOOL result= SendActivateMessageToSocket(param1, param2, param4, szUser);
    Local<Boolean> res = Boolean::New(isolate, result);
    args.GetReturnValue().Set(res);
  }
  catch(exception &e)
  {
    if (NULL != isolate)
    {
        isolate->ThrowException(Exception::TypeError(String::NewFromUtf8(isolate, e.what())));
    }
    return;
  }
DWORD getParam1(Local<Value> val, Isolate* isolate)
{
  if (val->NumberValue( isolate->GetCurrentContext()).ToChecked() < 0) throw invalid_argument("Negative value received.");
  return (DWORD)(val->Uint32Value(isolate->GetCurrentContext()).ToChecked());
}

Update: node js calls Activate() from nodejs code

@addaleax
Copy link
Member

this is one of the cpp files where nodejs sends data to convert/map to c++ byte array to send to sockets

This is obviously an incomplete file.

@jambagigirish
Copy link
Author

cpp files are too big to share here. with version 12.18.00 I didn't see any issues. Are there any breaking changes from version 12.18 to 12.19?

@addaleax
Copy link
Member

Are there any breaking changes from version 12.18 to 12.19?

None that are intentionally breaking, no, and none that were reported so far.

cpp files are too big to share here

Okay, well, let me be clear here: I’m trying to help you. If you can’t show us a reproduction of the issue, that’s harder. Without looking at the causing source code, that’s even harder. The file size limit for Github comment attachements is 25 MB, and I honestly doubt a bit that you have C++ files larger than that (the biggest .cc file in all of Node.js is less than 1 MB, for reference).

Looking at your code to try to figure out what’s going on is a offer from our side that I don’t really need to make here – I do this in my free time these days, and you’re using Node.js and all the work put into it for free as well. If you can’t share your source code and you’re doing closed-source development, that means you’re taking on responsibility for solving issues with your source code yourself. That’s fine – you’re absolutely free to do that. But if you want help, you need to provide information so that people can actually help you.

That being said, the stack trace points to an issue with garbage collection. That also means that the transition from 12.18.0 to 12.19.0 might not even be the main cause here, and that GC timing has changed instead. If you do suspect that this might be a bug in Node.js, and you cannot share more information, you can bisect the range between the two tags on github to figure out what commit introduced this crash.

@jambagigirish
Copy link
Author

jambagigirish commented Oct 14, 2020

Capture

Thanks for you response. Main issue is that native code is divided over multiple files as shown below. all these files do the same processing as shown in above snippet. As its a production/copyrighted c++ code I can't share the much c++ code here. Sorry for that.

I am using node-gyp library to build c++ code. May be I can divide the issue between 2 repos.

After trying to build debug version I wasn't able to get the valid-readable stack trace.

I tried to generate something similar to this but no luck-#33283

@ronag
Copy link
Member

ronag commented Oct 14, 2020

@jambagigirish I think your best bet is to build and bisect Node yourself and try to find which commit caused the regression you are seeing.

Alternatively, try to create a minimal reproducible example that we can use, otherwise there is not much we can do to help.

@PoojaDurgad PoojaDurgad added addons Issues and PRs related to native addons. c++ Issues and PRs that require attention from people who are familiar with C++. labels Oct 19, 2020
@koonpeng
Copy link

I am getting the same error (similar backtrace) as well with rclnodejs. There is a reproducible code snippet posted here RobotWebTools/rclnodejs#713, however running it requires ros2.

@koonpeng
Copy link

This appears to be the commit causing the error a6b6556. I'm guessing some destructors is failing to check if the finalize callback is valid.

@hertzg
Copy link

hertzg commented Oct 20, 2020

I have also noticed a segfault in my package which is using ffi-napi. There were no code changes on the package side but the segfault only happens on 12.19.0.

Node version Release Date Workflow Run
v10.22.1 2020-09-15 ✔️ Passed
v12.18.4 2020-09-15 ✔️ Passed
v14.11.0 2020-09-15 ✔️ Passed
v14.12.0 2020-09-22 ✔️ Passed
v14.13.0 2020-09-29 ✔️ Passed
v12.19.0 2020-10-06 🛑 Failed
v14.13.1 2020-10-07 ✔️ Passed
v14.14.0 2020-10-15 ✔️ Passed

@Flarna
Copy link
Member

Flarna commented Oct 20, 2020

fyi @nodejs/n-api

@mhdawson
Copy link
Member

created nodejs/abi-stable-node#411 to make sure we talk about it in the next N-API team meeting.

@mhdawson
Copy link
Member

From an initial look at a6b6556, I don't think it's correct because Finalize (

inline void Finalize(bool is_env_teardown = false) override {
) can still be called even if the user did not provide a finalizer and _finalize_callback is NULL.

If that's correct we should likely back out the change.

@mhdawson mhdawson added the node-api Issues and PRs related to the Node-API. label Oct 23, 2020
@mhdawson
Copy link
Member

We discussed in the N-API meeting today and @gabrielschulhof agreed with my assessment, will create PR to revert

mhdawson added a commit to mhdawson/io.js that referenced this issue Oct 23, 2020
Fixes: nodejs#35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.
@mhdawson
Copy link
Member

PR for revert: #35777

@Flarna Flarna added the confirmed-bug Issues with confirmed bugs. label Oct 27, 2020
@Flarna Flarna closed this as completed in 0b45382 Oct 27, 2020
@hertzg
Copy link

hertzg commented Oct 28, 2020

Will there be a patch release (having the revert commit) for v12 Or only after a proper patch is introduced?

@mhdawson
Copy link
Member

@hertzg #35777 is tagged for backport to 12.x and 14.x (based on the lts-watch-v12.x and lts-watch-v14.x) labels. Changes typically need to "bake" a bit in master/current before going back to LTS releases, but as a revert I'm thinking this might be able to skip that. @BethGriggs what's your thoughts on that ?

From nodejs/Release#494 , It looks like there might be a patch release but it does not seem to have a date and then the next planned release for the 12.x line is v12.20.0 and is scheduled for 17 Nov, it would be nice to get it into that one at the latest.

gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Nov 2, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:
```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:
```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: nodejs#34731
Refs: nodejs#34839
Refs: nodejs#35620
Refs: nodejs#35777
Fixes: nodejs#35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
targos pushed a commit that referenced this issue Nov 3, 2020
Fixes: #35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.

PR-URL: #35777
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
MylesBorins pushed a commit that referenced this issue Nov 3, 2020
Fixes: #35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.

PR-URL: #35777
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
MylesBorins pushed a commit that referenced this issue Nov 3, 2020
Fixes: #35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.

PR-URL: #35777
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
gabrielschulhof pushed a commit that referenced this issue Nov 5, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:

```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:

```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: #34731
Refs: #34839
Refs: #35620
Refs: #35777
Fixes: #35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35933
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
@BethGriggs
Copy link
Member

@mhdawson just to close the loop, @MylesBorins has pulled #35777 into the v12.20.0 release proposal (#35950)

danielleadams pushed a commit that referenced this issue Nov 9, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:

```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:

```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: #34731
Refs: #34839
Refs: #35620
Refs: #35777
Fixes: #35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35933
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
BethGriggs pushed a commit that referenced this issue Nov 16, 2020
Fixes: #35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.

PR-URL: #35777
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
MylesBorins pushed a commit that referenced this issue Nov 16, 2020
Fixes: #35620

This reverts commit a6b6556 which
changed finalization behavior related to N-API. We will investigate
the original issue with the test separately.

PR-URL: #35777
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
BethGriggs pushed a commit that referenced this issue Dec 9, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:

```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:

```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: #34731
Refs: #34839
Refs: #35620
Refs: #35777
Fixes: #35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35933
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
BethGriggs pushed a commit that referenced this issue Dec 10, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:

```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:

```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: #34731
Refs: #34839
Refs: #35620
Refs: #35777
Fixes: #35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35933
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
BethGriggs pushed a commit that referenced this issue Dec 15, 2020
Currently, a reference is being unlinked from the list of references
tracked by the environment when `v8impl::Reference::Delete` is called.
This causes a leak when deletion must be deferred because the finalizer
hasn't yet run, but the finalizer does not run because environment
teardown is in progress, and so no more gc runs will happen, and the
`FinalizeAll` run that happens during environment teardown does not
catch the reference because it's no longer in the list. The test below
will fail when running with ASAN:

```
./node ./test/node-api/test_worker_terminate_finalization/test.js
```

OTOH if, to address the above leak, we make a special case to not
unlink a reference during environment teardown, we run into a
situation where the reference gets deleted by
`v8impl::Reference::Delete` but does not get unlinked because it's
environment teardown time. This leaves a stale pointer in the linked
list which will result in a use-after-free in `FinalizeAll` during
environment teardown. The test below will fail if we make the above
change:

```
./node -e "require('./test/node-api/test_instance_data/build/Release/test_ref_then_set.node');"
```

Thus, we unlink a reference precisely when we destroy it – in its
destructor.

Refs: #34731
Refs: #34839
Refs: #35620
Refs: #35777
Fixes: #35778
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35933
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants