Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak problem #398

Closed
joway opened this issue Jun 15, 2018 · 39 comments
Closed

memory leak problem #398

joway opened this issue Jun 15, 2018 · 39 comments
Assignees

Comments

@joway
Copy link

joway commented Jun 15, 2018

We have add the apm agent with express , our codes is:

require('elastic-apm-node').start({
    serviceName: 'activity-service',
    serverUrl: 'http://es-apm-svc.infra.svc.cluster.local:8200',
    transactionSampleRate: 0.2,
    transactionMaxSpans: 32,
    flushInterval: 5,
    serverTimeout: 10,
    maxQueueSize: 50,
})

global.Promise = require('bluebird')

const PORT = process.env.PORT || 7000
const app = require('./app')

app.listen(PORT, function () {
    console.info('Node app is running, port:', PORT)
})

But we find the memory usage is increasing until hit the memory limit .

image

We also put the same code in Koa applications , but the problem is still existed .

We find the issues #243 But @Qard said that it has been fixed ?

We also try to change our config like :

require('elastic-apm-node').start({
    serviceName: 'activity-service',
    serverUrl: 'http://es-apm-svc.infra.svc.cluster.local:8200',
    transactionSampleRate: 0.2,
    transactionMaxSpans: 32,
    flushInterval: 5,
    serverTimeout: 10,
    maxQueueSize: 50,
    sourceLinesErrorAppFrames: 0,
    sourceLinesErrorLibraryFrames: 0,
})

But it's not work .

Our dependencies is :

    "mongoose": "4.9.9",
    "express": "4.12.4",
    "bluebird": "3.1.1",

and the applications run in kubernets .

@watson
Copy link
Member

watson commented Jun 15, 2018

Thanks for reporting this.

I have a few follow up questions to help us get to the bottom of the leak:

  • Are you using the latest version of the agent?
  • Are you running with any other APM agent loaded at the same time as Elastic APM?
  • Do you still see a memory leak if you disable span stack traces using captureSpanStackTraces: false?

@joway
Copy link
Author

joway commented Jun 15, 2018

Hi, @watson

  1. we using the latest version (1.6.0)
  2. no , it's the first line above all other code and we don't use any other apm
  3. yes , it's still leak when i set captureSpanStackTraces= false

@watson
Copy link
Member

watson commented Jun 18, 2018

Thanks for the details. Which version of Node.js are you running and do you mind sharing the entire dependencies section from your package.json file with us?

@joway
Copy link
Author

joway commented Jun 19, 2018

@watson node10 and node 8.9

{
 "dependencies": {
    "@types/lodash": "^4.14.106",
    "aws-sdk": "^2.130.0",
    "bluebird": "^3.5.1",
    "cache-manager": "^2.6.0",
    "cache-manager-store-redis": "^0.2.0",
    "config": "^1.26.2",
    "debug": "^2.6.9",
    "deepcopy": "^0.6.3",
    "elastic-apm-node": "^1.6.0",
    "kafka-node": "^2.2.3",
    "kcors": "^2.2.1",
    "koa": "^2.3.0",
    "koa-bodyparser": "^3.2.0",
    "koa-json-body": "^5.3.0",
    "koa-morgan": "^1.0.1",
    "koa-router": "^7.2.1",
    "koa2-newrelic": "^2.2.0",
    "lodash": "^4.17.4",
    "mongoose": "^4.12.1",
    "newrelic": "^1.40.0",
    "pow-mongodb-fixtures": "^0.14.0",
    "raven": "^2.2.1",
    "redis": "^2.8.0",
    "request": "^2.83.0",
    "request-promise": "^4.2.2",
    "sa-sdk-node": "^1.1.1"
  },
  "devDependencies": {
    "ava": "^0.16.0",
    "eslint": "^3.4.0",
    "eslint-config-airbnb-base": "^10.0.1",
    "eslint-plugin-import": "^2.7.0",
    "nock": "^8.0.0",
    "nodemon": "^1.12.1",
    "rewire": "^2.5.2",
    "sinon": "^1.17.5",
    "sinon-as-promised": "^4.0.2",
    "supertest": "^2.0.0"
  },
}

@Thetortuga
Copy link

Hello,

We are tracking a memory leak in our code too for a while. After some investigations, it seems that it could come from this agent.
It only appears when we execute external HTTP request with request-promise module.
By analyzing memory dumps, it appears that a lot of "Transaction" instances are still in the memory.

@watson
Copy link
Member

watson commented Jun 20, 2018

@joway I can see you're running New Relic at the same time as Elastic APM. We have seen situations where running multiple APM agents in the same app will generate memory leaks, and it's as such not something advice doing. Would it be possible for you to disable New Relic for a short time to see if that removes the memory leak?

@joway
Copy link
Author

joway commented Jun 21, 2018

@watson I have remove newrelic in my code , but not uninstall it in package.json .

@joway
Copy link
Author

joway commented Jun 21, 2018

@watson I had remove all newrelic things both in code and package.json , and the problem is still existed .

@joway
Copy link
Author

joway commented Jun 21, 2018

By the way , does the apm-agent-nodejs will write local files ? We use kubernetes , and when it does not mount volume it will write file in memory .

@watson
Copy link
Member

watson commented Jun 21, 2018

Ok, thank you so much for helping us debug this problem. It's super hard to reproduce these issues, so it helps a lot. Approximately how many requests per minute do your application get where you see this memory leak?

By the way , does the apm-agent-nodejs will write local files ? We use kubernetes , and when it does not mount volume it will write file in memory .

No the agent never writes anything to disk. It does use an in-memory queue, but that will be flushed and sent to the APM Server when it reaches a 100 transactions (configurable using maxQueueSize) or after 10 seconds (configurable using flushInterval) - whichever happens first.

@Qard
Copy link
Contributor

Qard commented Jun 22, 2018

I've so far been unable to reproduce this leak with mongoose and bluebird under either express or koa. I also did some testing around request-promise and got nothing from that either. I suspect it's likely related to context loss, but it's impossible to verify without a reproducible test case. The most likely candidates to investigate next I think would be aws-sdk, kafka-node and cache-manager. Can you share a bit about how you are using those? I'm especially interested in aws-sdk, since it contains quite a bit.

@Thetortuga
Copy link

In my case, I am using a custom Transaction to handle some udp traffic. @joway On which kind of Transaction / Span are you experiencing leak ?
I am trying on my side to reproduce the problem with a miminum code example but so far I can't.
My application is running on a legacy server, outside docker, I am not using nor kafka or aws-sdk.
As a quick fix, I disabled http instrumentation, as it seems to fix the issue.

@watson
Copy link
Member

watson commented Jun 27, 2018

@Thetortuga Thanks for the details. It might be two different memory leaks that you're experiencing. When you say that the memory leak disappears when you disable http instrumentation, how do you do that?

@joway I know it might sound silly, but I just want to make sure that the memory leak you're experiencing is actually related to Elastic APM. Does it disappear if you disable the agent by setting active: false in the config?

@joway
Copy link
Author

joway commented Jun 28, 2018

@watson yes , I'm sure it caused by the apm agent . And I'm try to create a demo code to reproduce it . https://github.com/joway/elastic-apm-demo

Now , the first thing I found it's that when I enable source-map-support/register it will reproduce the memory leak .

But in my production application, I just use ES5 javascript and no source-map-support/register , it also have memory leak .

I doubt that if I have a package which use the source-map-support/register ...

@Thetortuga
Copy link

@watson Yes, this is what I begin to suspect. Do you want that I open an new issued about it ?
I disabled http instrumentation with the disableInstrumentations: ['http'] configuration.

@mcolmant
Copy link

mcolmant commented Jul 4, 2018

Hi,

We are experiencing the same problem. We have a NodeJS/Express REST API (node 8.11.3) hosted in Heroku.
The memory keeps growing over runtime until to reach the dyno memory limit.
Somehow, it seems that the memory used by the agent is not correctly released.
Once the agent disabled via the environment variable, the memory usage remains very low.

We use the APM agent in version 1.6.0, with default parameters.
Do you think an upgrade can resolve this problem?
What can we do to limit this overhead?

Thanks.

@watson
Copy link
Member

watson commented Jul 4, 2018

@mcolmant Thanks for letting us know.

To narrow down what's going on it's best to try these things first:

  1. Ensure you're running the latest version of the agent before anything else
  2. If the memory leak still persists, try to configure the agent with captureSpanStackTraces: false
  3. If that doesn't fix it, try instead to configure the agent with disableInstrumentations: ['http']
  4. If that doesn't fix it, try instead to configure the agent with instrument: false
  5. If that doesn't fix it, try instead to configure the agent with active: false

Let me know if and at which point the memory leak disappears.

@Thetortuga
Copy link

Finally, I was wrong and the disableInstrumentations: ['http'] did not succeed to mitigate the leak.
I my case, captureSpanStackTraces: false had no effect.
My colleagues replaced me on this issue and are trying the 4 & 5.

@PestoP
Copy link

PestoP commented Jul 4, 2018

Hey ! I'm one of the colleagues x)

With 1.8.2 :

  1. Leak
  2. Leak
  3. No Leak
  4. No Leak

We're trying to reproduce with a small environnement, we'll let you know

@watson
Copy link
Member

watson commented Jul 4, 2018

Thanks @PestoP. Yes it would be super useful for us to have a small app where the problem can be reproduced 👍

@fejalish
Copy link

fejalish commented Aug 28, 2018

Hi, we are having the exact same issue in our app. We've run into it in our Production ENV as well as during load testing (with devtools Heap Timelines showing the memory increase).

After some testing we've found that:

  • ELASTIC_APM_ACTIVE: false does not leak
  • ELASTIC_APM_INSTRUMENT: false does not leak
  • ELASTIC_APM_CAPTURE_SPAN_STACK_TRACES: false does leak

But ELASTIC_APM_DISABLE_INSTRUMENTATIONS: http did not leak for us - though nothing showed up in APM Kibana at all. So it was basically like having APM disabled anyway it seems.

We're running Node 8.9.* with elastic-apm-node 1.10.1 using Koa (with Postgresql and Redis).

@watson
Copy link
Member

watson commented Aug 28, 2018

@fejalish thanks for reporting this. Is it possible for you to either share an example app where the leak occurs or take some heap snapshots and compare them in the Chrome Developer tools to possibly get an idea of which objects that are leaking?

@fejalish
Copy link

fejalish commented Aug 28, 2018

@watson I was going through the heap snapshots to see if I could figure out what was leaking, and saw lots of Transaction, Arrays, Objects, and closures Constructors, all referencing IncomingMessage, Asynchook, _tickCallback(), and elasticAPMCallbackWrapper() objects.

Would you like me to send you the snapshots somehow at all? I have snapshots with and without the leaks.

@watson
Copy link
Member

watson commented Aug 28, 2018

@fejalish I would not recommend sharing these heap snapshots if they are from a production environment as they might contain personally identifiable information - so an example app would be better. But if you're sure they do not, can you upload them to Dropbox or something similar and share them with me by sending a link to watson at elastic.co?

@simllll
Copy link
Contributor

simllll commented Sep 5, 2018

Little Offtopic: I had a similar experience with a "memory leak", but for me it was that I have long running transactions, which collect a lot of Spans. So far it looks like setting transactionMaxSpans to e.g. 100 (anything is better than unlimited ;)) solved this for me.

@Davidiusdadi
Copy link

I also ran into this issue in production and setting ELASTIC_APM_CAPTURE_SPAN_STACK_TRACES=false solved it for now.

Here is how it affacted the memory usage:

image

The first drop is an emegency restart due to the memory leak - the second drop is when deployed with ELASTIC_APM_CAPTURE_SPAN_STACK_TRACES=false

I suspect the issue is in apm-agent-nodejs or one of it's dependencies.

Just to give you the full context:
I bootstrap apm-agent-nodejs like you see below in order to solve some local issue where different packages overwrite prepareStackTrace (#588). Im just posting it as the mem-leak might be related to that code (though it's probably not).

The follwoing code allows any library to overwrite prepareStackTrace any way they like while at the same time ensuring that the error-callsites code which is used by apm-agent-nodejs is always executed.

type prepTrace = (error, structuredStackTrace) => void
type Wrapper = prepTrace & { impl?: prepTrace }
let prepareStackTraceImpl = undefined
Object.defineProperty(Error, 'prepareStackTrace', {
    set: (v) => {
        if (v.impl) {
             prepareStackTraceImpl = v.impl
            return
        }
        prepareStackTraceImpl = v
    },
    get() {
        if (!prepareStackTraceImpl) {
            return
        }
        const wrapper: Wrapper = (err, callsites) => {
            // the following call is copied from error-callsites package
            Object.defineProperty(err, '__error_callsites', {
                enumerable: false,
                configurable: true,
                writable: false,
                value: callsites
            })

            return wrapper.impl.call(Error, err, callsites)
        }
        wrapper.impl = prepareStackTraceImpl

        return wrapper
    }
})

const apm = require('elastic-apm-node').start()

@junereycasuga
Copy link

Is there any fix for this issue yet?

@Qard Qard self-assigned this Oct 24, 2018
@Qard
Copy link
Contributor

Qard commented Oct 24, 2018

Unfortunately, no. We haven't yet been able to reproduce the issue ourselves, and no reproduction case has been given.

If you are encountering memory issues like this it would be a huge help to us if you could please share your package.json dependencies, agent settings, Node.js version and, if possible, code for a simplified app that reproduces this issue.

@Qard Qard removed their assignment Oct 25, 2018
@TitTurtle
Copy link

TitTurtle commented Nov 2, 2018

Hello,
We are experiencing the same kind of problem on our production server too.

Like joway, we use these packages :
"bluebird": "^3.4.6",
"request": "^2.81.0",
"request-promise": "^4.2.1",

Also using these:
"hapi": "^17.6.0",
"apollo-engine": "^0.5.4",
"apollo-server-hapi": "^2.1.0",
"elastic-apm-node": "^1.12.0"
"mysql2": "^1.4.1",
"sequelize": "^4.38.0",
"dataloader": "^1.3.0",

bluebird override Promise:
global.Promise = require('bluebird').getNewLibraryCopy()

node version 8.12.0.

Agent is loaded with default configuration.
A custom transaction is created each time a graphQl resolver is access. Some span are created using sequelize model hooks (before and after).
The transaction is ended at the end of the graphQl resolver. The status is set to "success". The custom context is set with the GraphQl query.
Multiples transaction are used in parrallel.

After less than 24h of uptime: heap was 328mb
~230mb was from Transaction (639 objects)
The transactions Map had ~25k entries (with multiple reference to Transaction objects).
Transaction were of type custom but also request.


After analysing some heap dump of the process,
Some Transaction are not released from the Map that is in the async-hook.js file.
There're a lot of key with null value and a lot of key referencing the same Transaction object (same memory address) that are staying in the Map eternally (never destroyed nor garbage).

From what I understand, some PROMISE hooks type are init in the async-hook.js but are not destroyed, this might be link to promise cache used in dataloader package (storing a Promise in a map or a variable) or / and in local Promise cache, that store resolved Promise for reused OR PromiseWrap.
it seems that some internal Transaction (intercepted http requests) are also not destroyed and staying.

A way to release these PROMISE might be to use the promiseResolve callback of async hooks (tried it on a local solution and seems to do the job but can't really tell as it's not in production) ? and this might move the leak somewhere else.

Note that, for now I can't reproduce on a test project, so this might also be linked in wrong use of apm client in our production software.

Hope this can help.

@Qard Qard self-assigned this Nov 5, 2018
@Qard
Copy link
Contributor

Qard commented Nov 9, 2018

The problem with using promiseResolve is that a promise can be listened to with then(...) from multiple paths, and clearing the transaction from the map in promiseResolve will result in only the first one binding the context correctly.

I'm working on trying to reproduce the issue more consistently in a reduced test case. If we can identify more specifically what interaction leads to this leak so we can hopefully figure out a way to work around it.

It would be nice if we could figure out a way to map async contexts to the current transaction through a weakmap, but async_hooks is based off numeric context ids and weakmaps don't support numeric keys. 🤔

For now, you can try disabling async_hooks by using asyncHooks: false in the config or ELASTIC_APM_ASYNC_HOOKS=false in environment variables.

@swift1911
Copy link

is any progress about this issue?

@watson
Copy link
Member

watson commented Nov 13, 2018

Thanks for being patient with us on this. It's been a complicated process to debug and narrow down the culprit.

But we now think that we have a fix. We just released version 1.14.3 of the agent which fixes the memory leak that we identified. Please try it out and let us know if it also fixed it for you 😃

@swift1911
Copy link

@watson Test in my production env, it seems be fixed :)
image

@junereycasuga
Copy link

yea. it seems to be fixed. we have only tested it in our performance testing environment. hopefully we'll get it to our production environment soon. thanks @watson @Qard

@watson
Copy link
Member

watson commented Nov 15, 2018

Great to hear!

I'll close this issue for now. If it turns out to persist, please leave a comment here and we'll reopen the issue.

@Thetortuga
Copy link

Hi. It appears to be ok for us too ! Thanks !

@ambika93
Copy link

ambika93 commented Jan 8, 2019

@watson I'm using 2.0.6 version, I'm still facing same issue. I'm using all default configurations.
Node dependency as below:
"aerospike": "3.0.2",
"async": "^2.6.0",
"bitset.js": "~4.0.1",
"body-parser": "^1.14.1",
"commander": "^2.9.0",
"ejs": "^2.3.4",
"express": "~4.9.6",
"kafka-node": "^2.6.1",
"lodash": "~3.10.1",
"moment": "~2.10.6",
"morgan": "~1.3.2",
"mysql": "^2.10.2",
"newrelic": "~1.29.0",
"node-uuid": "^1.4.7",
"nodemailer": "0.7.1",
"on-finished": "~2.3.0",
"q": "^1.4.1",
"redis": "^2.4.2",
"request": "^2.69.0"
I'm using new relic but it is disabled currently.

@watson
Copy link
Member

watson commented Jan 9, 2019

@ambika93 I believe that what caused this particular issue have been fixed. You might be experiencing another similar issue though. Could I get you to open a new GitHub issue instead so that we don't "spam" the people in this thread while we try to work out what's going on?

@ambika93
Copy link

ambika93 commented Jan 9, 2019

@watson, Thanks for responding. I've opened a separate thread for the same. #772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests