Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context occasionally gets corrupted #51

Open
Quantumplation opened this issue Dec 2, 2019 · 15 comments
Open

Context occasionally gets corrupted #51

Quantumplation opened this issue Dec 2, 2019 · 15 comments

Comments

@Quantumplation
Copy link

Hello!

Trying to debug a pretty tricky issue. We use this library to thread a distributed traceId down to our logs and shared HTTP library across many many microservices. In rare cases, we're seeing logs and requests hop from one traceId to another. In essence, this code:

      const oldTrace = clsHooked.getNamespace('DEFAULT_NAMESPACE').get('context').traceId;
      const res = await httpClient.get(someUrl);
      const newTrace = clsHooked.getNamespace('DEFAULT_NAMESPACE').get('context').traceId;
      if(oldTrace !== newTrace) {
        console.log('~~ TRACE ID MISMATCH: ', oldTrace, newTrace);
        process.exit(1);
      }

Fails pretty consistently, and I'm trying to figure out why. I'm still grappling with how async_hooks and cls-hooked work in general, but my understanding is that things like older or non-standard promise libraries or custom thenables (as mentioned in #37) can cause this to happen.

Any advice on tracking down exactly what's happening?

@SupernaviX
Copy link

Worth noting that the httpClient here is an axios instance, and doesn't seem to be using custom thenables.

@Quantumplation
Copy link
Author

(@SupernaviX is a coworker of mine 😅)

Another thing worth noting is that this service has babel transpilation with the following plugins/config:

"babel-polyfill": "^6.26.0",
"babel-preset-bluebird": "^1.0.1",
"babel-preset-env": "^1.6.0",
"babel-register": "^6.26.0",
{
  "presets": ["bluebird", [
    "env", {
      "targets": {
        "node": "current"
      }
    }
  ]]
}

@Jeff-Lewis
Copy link
Owner

I'm thinking this is related to #36. Which version of node are you running?

@Quantumplation
Copy link
Author

@Jeff-Lewis

/srv/package $ node --version
v10.17.0

@Quantumplation
Copy link
Author

@Jeff-Lewis

It's worth noting that our reproduce this involves hammering the service with GET requests. As far as I know there are no POST requests hitting the service.

@Quantumplation
Copy link
Author

@Jeff-Lewis any updates?

@william00179
Copy link

william00179 commented Jan 22, 2020

+1 to this issue.

We are currently using cls-hooked with Express. We have it as a middleware and are setting a few request specific values such as a client and tenant ID which are used for routing the connection to the correct database.

We have noticed our service is generating 404's (as part of validation from database queries) as the connections were being routed to the wrong databases. As part of debugging this issue we wrote some scripts to load the server calling GET's with valid information but quickly swapping between two different tenants and thus the connections should be routed to different databases.

Within 30 seconds or less, we would get a 404 and upon inspecting the logs, the request would be using the tenant ID from the previous request.

We have not been able to make any inroads on fixing this bug except for the scripts which easily reproduce it. We don't have any problems in times of low concurrent requests.

We are running Node 12.13.0.

@Strate
Copy link

Strate commented Feb 12, 2020

@william00179 hello, do you managed your issue somehow? Because we want to start use cls-hooked library to track tenant-id too.

@william00179
Copy link

We have not been able to resolve this problem and removed cls-hooked.

@TheBestMoshe
Copy link

@william00179 What are you using instead?

@william00179
Copy link

We just went back to using req.local

@microsoftly
Copy link

I've encountered similar issues.

Node v8.17.0

@goulash1971
Copy link

@Quantumplation / @william00179 ... not sure if this wil help however ... we have found that things can get messed up when there are complex promise chains and/or awaits within try/catch blocks because of the way in which the promises are resolved.

We ended up working with async_hooks directly and managing the storage differently for different types of async resource.

We also created "frames" (think of them as storage that can be linked to multiple async resources) and then attaching the frame for a resource to any child async resource that is created (via the triggerAsyncId).

This enabled us to retain the storage even if the triggering async resource (aka the parent async resource) was destroyed prior to the child async resource.

Trust me when I say it is even more complex to write the code than it is to explain :)

TBH I would not say that we are 10000% certain that we have completely nailed it, but this work along with some coding rules to keep promise chains short and try/catch with await calls as clean as possible, means that we haven't hit any issues.

My advice would be to to look at the async resource types that are being initialised and destroyed, and also what's happening in the promiseResolve hook, that will give you some direction on where things are going awry.

@kjhangiani
Copy link

Has anyone found a resolution to this? We appear to be experiencing the same issue as described here. Still running some tests to see if I can triage the issue further, but the description sounds exactly like what we see - data that is leaking between requests.

@natoszme
Copy link

natoszme commented Jul 7, 2022

Same issue here! Is there any other soluction other than stop using cls-hooked?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants