Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib deflate results in a memory leak #8871

Open
jasonmcaffee opened this issue Sep 30, 2016 · 73 comments
Open

zlib deflate results in a memory leak #8871

jasonmcaffee opened this issue Sep 30, 2016 · 73 comments
Labels
confirmed-bug Issues with confirmed bugs. memory Issues and PRs related to the memory management or memory footprint. zlib Issues and PRs related to the zlib subsystem.

Comments

@jasonmcaffee
Copy link

jasonmcaffee commented Sep 30, 2016

  • Version:6.7.0
  • Platform:Darwin ITs-MacBook-Pro.local 15.6.0 Darwin Kernel Version 15.6.0: Thu Jun 23 18:25:34 PDT 2016; root:xnu-3248.60.10~1/RELEASE_X86_64 x86_64
  • Subsystem:zlib

I'm using the graylog2 package for logging, and we ran into significant memory leak issues.
After tracking it down, I found zlib.deflate is the source of the issue.
The issue is magnified when running code inside of docker with the latest node distribution.

Running the below code on my macbook pro results in the memory spiking to ~3GB, then released down to 600MB.
Running the code in the latest node docker distro results in memory spiking to ~3GB, and it is never released.

let zlib = require('zlib');

let message = {
  some:"data"
};
let payload = new Buffer(JSON.stringify(message));

for(var i =0; i < 30000; ++i){
  zlib.deflate(payload, function (err, buffer) {
  });
}

setTimeout(()=>{}, 2000000);

This has resulted in our docker containers crashing due to memory exhaustion.

@addaleax addaleax added zlib Issues and PRs related to the zlib subsystem. memory Issues and PRs related to the memory management or memory footprint. labels Sep 30, 2016
@mscdex
Copy link
Contributor

mscdex commented Sep 30, 2016

Are you sure the node version is the same between the two?

@jasonmcaffee
Copy link
Author

Yes. I've also recreated the issue on v4.5 and 5.12.

@smyt1
Copy link

smyt1 commented Sep 30, 2016

@jasonmcaffee have you experimented with the sync version zlib.deflateSync? I see the same thing with the async version but the sync version seems more tame

@bnoordhuis
Copy link
Member

bnoordhuis commented Oct 1, 2016

The loop is creating 30,000 concurrent zlib.deflate requests that are handed off to a threadpool; i.e., it's creating a giant backlog. They are dispatched eventually and that is why memory goes down again on OS X.

On Linux (or more precisely, with glibc), something else happens. When you run it through strace -cfe mmap,munmap,mprotect,brk you can see what is going on:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.63    3.803199          32    119985           brk
  0.20    0.007680          12       632           mmap
  0.12    0.004748           7       675           munmap
  0.05    0.001882          30        62           mprotect
------ ----------- ----------- --------- --------- ----------------
100.00    3.817509                121354           total

Node.js uses malloc/new and glibc translates those overwhelmingly to brk system calls. The problem is that the "program break" goes up but a few allocations with longer lifetimes make it impossible for the break to go down again. There is no real memory leak, it's ordinary - but rather catastrophic - memory fragmentation.

IOW, can confirm but node.js is not the root cause. Maybe we can sidestep it by using a custom allocator like jemalloc but that has a lot of ramifications, it's not something we can do lightly.

@bnoordhuis bnoordhuis added the confirmed-bug Issues with confirmed bugs. label Oct 1, 2016
@jasonmcaffee
Copy link
Author

Thanks @bnoordhuis for the breakdown and explanation.

@kevinburke
Copy link

FWIW, we just fixed a problem that had similar symptoms, and our solution was to disable transparent huge pages. I'd encourage you to investigate and see if a) they are turned on, and b) if disabling them fixes your problem.

@jorangreef
Copy link
Contributor

Could this be an issue if multiple clients are triggering async deflate calls within a few ms of each other? Should people just avoid deflate() all together?

@jorangreef
Copy link
Contributor

Does this go much further than just deflate()? Could this be affecting most async calls which allocate memory on Linux?

@TylerBrock
Copy link

Does anyone know which versions of node do not have this bug? We are seeing this on latest Node8 but also on all versions of 6 that I sampled.

I run the code snippet above in node invoked with --expose-gc and call global.gc() to free the memory but 3gb is still resident no matter what I do. This is killing us in production as we have to restart our containers every couple of hours.

@bnoordhuis
Copy link
Member

@TylerBrock Did you read through this issue and the linked issues? If you did, then you already know the answer.

@TylerBrock
Copy link

I did but I'm still not sure so I apologize if I missed something.

People have suggested it could be new zlib, new v8, or something in crypto (possibly related to zlib) but going back to versions of node having older zlib and v8 (checked via process.versions) yielded similar results.

Would you mind summarizing where we are at?

@TimothyGu
Copy link
Member

TimothyGu commented Jul 20, 2017

@TylerBrock

From #8871 (comment):

There is no real memory leak, it's ordinary - but rather catastrophic - memory fragmentation.

IOW, can confirm but node.js is not the root cause. Maybe we can sidestep it by using a custom allocator like jemalloc but that has a lot of ramifications, it's not something we can do lightly.

In other words, this is not a Node.js, or zlib, or V8 issue, but rather caused by how the system memory allocator works.

@bnoordhuis
Copy link
Member

Disabling transparent huge pages (or setting it to madvise) may also help.

@TylerBrock
Copy link

Must be something different on my end because THP is already set to madvise. Thanks for the clarification.

@STRML
Copy link
Contributor

STRML commented Sep 8, 2017

@TylerBrock Did you come to any solution? Disabling THP on our end gave us a little bit of headroom, but it's to the point where a machine blows every day or so due to use of permessage-deflate in a websocket lib. We'd rather not disable compression but are running out of viable options.

@TylerBrock
Copy link

@STRML We just went back to node 6. Things run for months and months without leaking memory.

It's a bummer though, I'd love to use node 8. It's much faster, more memory efficient (when not leaking), and has async/await.

What I don't understand is, why is memeory fragmentation an issue now. Did the allocation strategy or allocator change between 6 and 8?

@addaleax
Copy link
Member

addaleax commented Sep 9, 2017

Did the allocation strategy or allocator change between 6 and 8?

Not in Node – possibly in V8, but I don’t think it would have effects like this.

Fwiw, I couldn’t reproduce measurable differences in behaviour/memory usage using the original issue comment script here.

@STRML
Copy link
Contributor

STRML commented Sep 9, 2017

That's interesting @TylerBrock, I wouldn't have expected the version to matter. I see the same results in all versions I tested (>= 4).

I have noticed that deflateSync doesn't cause nearly the same problems, topping out at about 280MB vs 3GB for deflate(). It seems the application side could actually mitigate this problem somewhat by limiting concurrency on deflate().

@puzpuzpuz
Copy link
Member

Did a small experiment with lazy initialization of zlib stream and it seems to help with keeping the RSS around 200MB instead of 2.7GB on the original script. The code is as ugly (and buggy) as it can be, but hopefully the idea is clear enough: puzpuzpuz@b4be58d

I'm not sure if that approach makes any sense, but maybe it could help to lower the footprint in a lot of implicit zlib stream allocations scenario, just like it was done with zlib.deflate() in the original script.

@addaleax
Copy link
Member

@puzpuzpuz It’s certainly worth experimenting with – does it fix the issue here for you? If yes, I’d say go for a PR :)

@puzpuzpuz
Copy link
Member

@addaleax yes, it seems to mitigate the issue (see #8871 (comment)). According to my experiment, each zlib instance occupies around 87KB when the original script is run. I'll try to polish the patch and submit a fix PR.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jun 26, 2020

I've submitted #34048 which implements lazy zlib initialization, when possible. Behavior of the fix for the original script is described here: #8871 (comment)

The difference with the latest master may be clearly seen on this variation of the script:

const zlib = require('zlib');

const message = {
  some: "data"
};
const payload = Buffer.from(JSON.stringify(message));

setInterval(() => {
  for (let i = 0; i < 30000; ++i) {
    zlib.deflate(payload, () => { });
  }
}, 2000);

With master I get 16GB RSS after about 1 min of execution and it keeps slowly growing. With the fix RSS fluctuates, but never goes beyond 4GB. OS info: Linux apechkurov-laptop 4.15.0-108-generic #109-Ubuntu SMP Fri Jun 19 11:33:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Any feedback, testing on other platforms and code reviews are appreciated.

@puzpuzpuz
Copy link
Member

Closing this one as #34048 has landed. Please let me know if you still face the same behavior, so I could reopen the issue.

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

@puzpuzpuz It seems that #34048 only mitigates the issue but does not fix it. Am I right?

I'm using this test on Ubuntu 18.04

'use strict';

const util = require('util');
const zlib = require('zlib');

const deflate = util.promisify(zlib.deflate);
const payload = Buffer.from(JSON.stringify({ some: 'data' }));

async function testLeak() {
  const promises = [];

  for (let i = 0; i < 30000; i++) {
    promises.push(deflate(payload));
  }

  await Promise.all(promises);
}

async function main() {
  console.time('testLeak');

  for (let i = 0; i < 10; i++) {
    await testLeak();
  }

  console.timeEnd('testLeak');
}

main()
  .then(function () {
    setTimeout(function () {
      console.log('Memory:', process.memoryUsage());
    }, 10000);
  })
  .catch(console.error);

and the memory usage grows linearly with the number of times testLeak() is run.

@puzpuzpuz
Copy link
Member

@puzpuzpuz It seems that #34048 only mitigate the issue but does not fix it. Am I right?

@lpinca That may be true, as I did some experiments with the fix applied to the reproducer script, but would appreciate any feedback from the community. I'm going to reopen this issue, if necessary, once we get some feedback. Feel free to reopen it now, if you think that's necessary.

The fix seems to be decreasing the fragmentation, but it doesn't get rid of it completely. So, probably it's correct to say that it "mitigates" the issue, rather than "fixes" it.

I've also tried to run your snippet on the latest master:

$ ./node zlib.js 
testLeak: 15.255s
Memory: {
  rss: 540753920,
  heapTotal: 12722176,
  heapUsed: 2812944,
  external: 495556,
  arrayBuffers: 17590
}

And on b0b52b2 (which doesn't include the fix):

$ ./node zlib.js 
testLeak: 19.915s
Memory: {
  rss: 8469770240,
  heapTotal: 4333568,
  heapUsed: 2677264,
  external: 495098,
  arrayBuffers: 17590
}

The RSS is noticeably lower with the fix, yet it's still relatively large when compared with the heap size.

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

Yes, I get similar results on Linux.

With the mitigation patch

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 93.15   54.458683          37   1466699           mprotect
  5.55    3.242945         303     10698           munmap
  1.10    0.641740         113      5679           mmap
  0.20    0.119330          27      4353           brk
------ ----------- ----------- --------- --------- ----------------
100.00   58.462698               1487429           total

Without the mitigation patch

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 61.95    0.636388          11     59694           brk
 20.25    0.208015          27      7767           munmap
 16.07    0.165120           8     20336           mprotect
  1.73    0.017728           5      3915           mmap
------ ----------- ----------- --------- --------- ----------------
100.00    1.027251                 91712           total

On Windows the memory usage is stable at ~70 MB with or without the mitigation fix. I should try with jemalloc on Linux. I think it makes sense to reopen.

@puzpuzpuz
Copy link
Member

Yes, I get similar results on Linux.

I was also using Ubuntu 18.04 with glibc 2.27.

I think it makes sense to reopen.

OK, reopening it then.

@puzpuzpuz puzpuzpuz reopened this Jul 30, 2020
@puzpuzpuz
Copy link
Member

I should try with jemalloc on Linux.

@lpinca I've tried that with jemalloc 3.6.0-11 and here is the output:

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ./node zlib.js
testLeak: 13.778s
Memory: {
  rss: 334114816,
  heapTotal: 131715072,
  heapUsed: 103527424,
  external: 472453668,
  arrayBuffers: 471975702
}

@alestor123
Copy link

I should try with jemalloc on Linux.

@lpinca I've tried that with jemalloc 3.6.0-11 and here is the output:

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ./node zlib.js
testLeak: 13.778s
Memory: {
  rss: 334114816,
  heapTotal: 131715072,
  heapUsed: 103527424,
  external: 472453668,
  arrayBuffers: 471975702
}

Try using - zlib-bug-inflate

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

@puzpuzpuz

That's without the mitigation patch right? I also tried with jemalloc and there is an order of magnitude difference (~40 MB with the patch and ~400 MB without it).

@puzpuzpuz
Copy link
Member

@lpinca no, that's with the patch. I may be doing something wrong, as I don't see an order of magnitude difference.

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

Nvm it seems it was me that did something wrong:

luigi@ubuntu:~/leak$ node -v
v14.7.0
luigi@ubuntu:~/leak$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node index.js
testLeak: 57.830s
Memory: {
  rss: 46489600,
  heapTotal: 4251648,
  heapUsed: 2202536,
  external: 998285,
  arrayBuffers: 17590
}
luigi@ubuntu:~/leak$ nvm use 14.6.0
Now using node v14.6.0 (npm v6.14.6)
luigi@ubuntu:~/leak$ node -v
v14.6.0
luigi@ubuntu:~/leak$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node index.js
testLeak: 1:24.327 (m:ss.mmm)
Memory: {
  rss: 49336320,
  heapTotal: 3989504,
  heapUsed: 2189736,
  external: 997652,
  arrayBuffers: 17590
}

It seems jemalloc is a working workaround.

@puzpuzpuz
Copy link
Member

It seems jemalloc is a working workaround.

@lpinca you probably mean that jemalloc works more or less like #34048 when used with pre-14.7.0 versions of node, right? At least, that's how it looks like.

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

Better than that, I mean that with jemalloc there is no leak at all.

luigi@ubuntu:~/leak$ node -v
v14.7.0
luigi@ubuntu:~/leak$ node index.js
testLeak: 1:11.205 (m:ss.mmm)
Memory: {
  rss: 700317696,
  heapTotal: 132943872,
  heapUsed: 101497112,
  external: 493479181,
  arrayBuffers: 492498486
}
luigi@ubuntu:~/leak$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node index.js
testLeak: 57.279s
Memory: {
  rss: 49631232,
  heapTotal: 9494528,
  heapUsed: 2323328,
  external: 998285,
  arrayBuffers: 17590
}
luigi@ubuntu:~/leak$ nvm use 14.6
Now using node v14.6.0 (npm v6.14.6)
luigi@ubuntu:~/leak$ node -v
v14.6.0
$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node index.js
testLeak: 1:24.564 (m:ss.mmm)
Memory: {
  rss: 51523584,
  heapTotal: 4513792,
  heapUsed: 2273040,
  external: 997652,
  arrayBuffers: 17590
}

@puzpuzpuz
Copy link
Member

Better than that, I mean that with jemalloc there is no leak at all.

Strictly speaking memory fragmentation is not a leak and the allocator should be able to get rid of it when the allocation rate gets lower and fewer allocated memory chunks remain. If jemalloc doesn't have a noticeable fragmentation in the discussed scenario, that's great.

As a summary, it would be great to get some feedback from the community on v14.7.0 (which includes #34048) and v14.7.0 (or any previous version) with jemalloc.

@lpinca
Copy link
Member

lpinca commented Jul 30, 2020

Yes "leak" is not the correct term in this case. My point is that with jemalloc memory usage remains pretty much stable regardless of how many times the testLeak() function in the above example is run. This is not true with glibc even with #34048 applied.

@intellix
Copy link

How does this look in 2023 with Node 18+ ? There's no comments on this issue for 3 years so I'm wondering if it's safe to use permessage-deflate to gzip websockets and was pointed to this issue.

@STRML
Copy link
Contributor

STRML commented Oct 23, 2023 via email

@cyantree
Copy link

cyantree commented Dec 3, 2023

I've run the test from #8871 (comment)
on my machine with the following results:

  • Win 10
  • Ryzen 7 1800X
  • 32GB RAM
testLeak: 4:14.889 (m:ss.mmm)
Memory: {
  rss: 95129600,
  heapTotal: 5390336,
  heapUsed: 3328040,
  external: 1524293,
  arrayBuffers: 18659
}

During the test the process memory usage consistently oscillated between around 600 MB and 6,4 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. memory Issues and PRs related to the memory management or memory footprint. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

No branches or pull requests