Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async hook stack has become corrupted #3585

Closed
Tchoupinax opened this issue May 21, 2021 · 7 comments
Closed

async hook stack has become corrupted #3585

Tchoupinax opened this issue May 21, 2021 · 7 comments

Comments

@Tchoupinax
Copy link

Hello,

I met strange error after I added sentry to my project. When the error is thrown, the project crashes immediatly.

this:

 Error: async hook stack has become corrupted (actual: 286, expected: 0)

or this:

 Error: async hook stack has become corrupted (actual: 908, expected: 0)

sometimes this:

 node[1]: ../src/api/callback.cc:125:void node::InternalCallbackScope::Close(): Assertion `(env_->execution_async_id()) == (0)' failed.

I know it comes with sentry because I tried to remove Sentry.init and the error disappears.
In my project, I'm using the async_hooks API and I fear a conflict with your tracing SDK. I am using async_hooks this way:

const { AsyncLocalStorage } = require('async_hooks');
const store = new AsyncLocalStorage();

// ...

app.use((req, res, next) => {
  store.enterWith(new Map());

  const instance = store.getStore();
  instance.set('request-id', uuid());

  return next();
});

I am declaring Sentry this way

Sentry.init({
    environment: process.env.NODE_ENV,
    dsn: '__',
    integrations: [
      // enable HTTP calls tracing
      new Sentry.Integrations.Http({ tracing: true }),
      // enable Express.js middleware tracing
      new Tracing.Integrations.Express({ app: router }),
    ],

    // Set tracesSampleRate to 1.0 to capture 100%
    // of transactions for performance monitoring.
    // We recommend adjusting this value in production
    tracesSampleRate: 1.0,
  });

  // RequestHandler creates a separate execution context using domains, so that every
  // transaction/span/breadcrumb is attached to its own Hub instance
  router.use(Sentry.Handlers.requestHandler());
  // TracingHandler creates a trace for every incoming request
  router.use(Sentry.Handlers.tracingHandler());

Do you know how this issue could be solved ?

Thank you for your work :)

Version

Node: 16 (Using Docker image node:16-alpine)

Packages

@sentry/node: 6.4.1
@sentry/tracing: 6.4.1

@Tchoupinax
Copy link
Author

Update: issue seems to not appear with node:14-alpine and node:15-alpine. (Only my analyze, maybe I'm wrong)
Is issue also exists but it is not shown ? Or Node v16 brings a breaking change ?

@kamilogorek kamilogorek mentioned this issue May 26, 2021
9 tasks
@hongbo-miao
Copy link

hongbo-miao commented May 26, 2021

Got same error, only appears in Node.js 16. Not in Node.js 14 or 15. Copy useful info here:


Package + Version

  • @sentry/node

Version:

6.4.1

Description

Use Node.js 16.2.0 (latest version writing this), and simply use the demo at
https://docs.sentry.io/platforms/node/guides/express/

import express from "express";
import * as Sentry from "@sentry/node";

// or using CommonJS
// const express = require('express');
// const Sentry = require('@sentry/node');

const app = express();

Sentry.init({ dsn: "https://examplePublicKey@o0.ingest.sentry.io/0" });

// The request handler must be the first middleware on the app
app.use(Sentry.Handlers.requestHandler());

// All controllers should live here
app.get("/", function rootHandler(req, res) {
  res.end("Hello world!");
});

// The error handler must be before any other error middleware and after all controllers
app.use(Sentry.Handlers.errorHandler());

// Optional fallthrough error handler
app.use(function onError(err, req, res, next) {
  // The error id is attached to `res.sentry` to be returned
  // and optionally displayed to the user for support.
  res.statusCode = 500;
  res.end(res.sentry + "\n");
});

app.listen(3000);

The server can start successfully, however, if hit any endpoint such as going to http://localhost:3000 in this case, the server will crash with error

/Users/hm/.nvm/versions/node/v16.2.0/bin/node[89516]: ../src/api/callback.cc:125:void node::InternalCallbackScope::Close(): Assertion `(env_->execution_async_id()) == (0)' failed.
 1: 0x10ebbf7e5 node::Abort() (.cold.1) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 2: 0x10d872b39 node::Abort() [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 3: 0x10d8729a1 node::Assert(node::AssertionInfo const&) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 4: 0x10d7a2cdd node::InternalCallbackScope::Close() [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 5: 0x10d7a25de node::InternalCallbackScope::~InternalCallbackScope() [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 6: 0x10d813b88 node::Environment::RunTimers(uv_timer_s*) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 7: 0x10e267877 uv__run_timers [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 8: 0x10e26c0ad uv_run [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
 9: 0x10d7a3ea4 node::SpinEventLoop(node::Environment*) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
10: 0x10d8b4414 node::NodeMainInstance::Run(node::EnvSerializeInfo const*) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
11: 0x10d84097d node::Start(int, char**) [/Users/hm/.nvm/versions/node/v16.2.0/bin/node]
12: 0x7fff2057a621 start [/usr/lib/system/libdyld.dylib]
13: 0x7

I located, without line app.use(Sentry.Handlers.requestHandler());, it won't have this issue.

@kamilogorek
Copy link
Contributor

I just verified it locally, using exactly version 16.2.0 and had no luck reproducing it 🤔

image

@hongbo-miao
Copy link

hongbo-miao commented May 26, 2021

@kamilogorek I will try to provide a repo with more detailed steps to reproduce today. Thanks!

@hongbo-miao
Copy link

hongbo-miao commented May 26, 2021

@kamilogorek sorry for early misleading demo code.

Here you are! https://github.com/Hongbo-Miao/bug-nodejs-16
Just follow the step in the readme to reproduce.

I further did some research.

  1. I found if removing OpenTelemetry by commenting out
    https://github.com/Hongbo-Miao/bug-nodejs-16/blob/main/src/index.ts#L2
    The server won't have any issue.

  2. Or commenting out Sentry this line
    https://github.com/Hongbo-Miao/bug-nodejs-16/blob/main/src/app.ts#L7
    The server won't have any issue neither.

So I am guessing some code in OpenTelemetry affects Sentry.

@kamilogorek
Copy link
Contributor

This is some kind of regression in Node v16.2.0. It's working fine in v16.1.0.

The problem however, is incorrect request rejections handling between domains and async_hooks. Even simplest domain.create().run(() => next()) as an express middleware, when used with async_hooks (which are used in opentelemetry for context management - https://github.com/open-telemetry/opentelemetry-js/blob/dfb77372d311587c3462a43497c2ec4b3867e9fa/packages/opentelemetry-node/src/NodeTracerProvider.ts#L61-L63)` will throw an error. For some strange reason, when used together, they cannot talk with each other correctly.

It's most likely that error thrown within async_hooks context is not correctly propagated to the domain, and it crashes the process instead of being handled there. Might be related:

nodejs/node#38503
nodejs/node#38145

One additional information here though, is that what's causing an actual error that is crashing the process, is TCP not being able to connect to the OpenTelemetry server, which you can see when using Node v16.1.0

{"stack":"Error: connect ECONNREFUSED 127.0.0.1:55681\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1133:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:131:17)","message":"connect ECONNREFUSED 127.0.0.1:55681","errno":"-61","code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":"55681","name":"Error"}

55681 is default port for OpenTelemetry, and because you are using CollectorTraceExporter, and not providing url override, it'll try to send request to localhost:55681, which will reject, and effectively fail and crash the process.

There's nothing that we can do from our side right now, as it's infrastructure specific issue, not the SDK itself.
We need to wait for Node to state clearly if it's going to be fixed, and if not, then update docs appropriately.

@hongbo-miao
Copy link

hongbo-miao commented May 27, 2021

Thanks @kamilogorek!
I am aware of I don't have OpenTelemetry server in this demo repo. I am using this as a demo to reproduce the bug.
Yeah just confirmed it only started from 16.2.0. Opened a ticket on Node.js side at nodejs/node#38814.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants