Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-mortem out of memory analysis #239

Closed
paulrutter opened this issue Oct 4, 2018 · 26 comments
Closed

Post-mortem out of memory analysis #239

paulrutter opened this issue Oct 4, 2018 · 26 comments

Comments

@paulrutter
Copy link

Hi,

While reading the readme on https://github.com/nodejs/diagnostics, one of the domains is "Heap and Memory Analysis", although i cannot find anything about post-mortem out of memory analysis.

In our usecase, we run several hundred nodejs processes (in a Docker container), which are restricted to run with very low memory (+- 80MB). When these go out of memory, we want to be able to analyze what's on the heap when it happened.

For that purpose, we created "node-oom-heapdump" (https://github.com/blueconic/node-oom-heapdump), which does exactly that.

Is post-mortem out of memory analysis planned to be included in the Node.js core?

@joyeecheung
Copy link
Member

joyeecheung commented Oct 4, 2018

Not in node core, but I tend to use https://github.com/nodejs/llnode to diagnose coredumps produced during OOM (it probably needs a few tweaks to generate coredumps in docker when the process crashes though). You can at least see the stacktrace (C++ and JS) when the OOM happens which tend to be useful (unless the process is not running the code that causes the problem when it crashes). You can also scan the heap and find the types with the most instances which is also helpful most of the times.

@paulrutter
Copy link
Author

Thanks!

I have seen that one in the past but found it to be a bit too low-level for my purposes.
Of course, it might provide more insight, but the threshold to be able to use it seems to be a bit higher then just including our own module (which creates .heapsnapshot files which can be loaded in Chrome DevTools).
I would prefer something easier; just like Java does, it would be nice to have some kind of command line flag which automatically creates those heapdumps on out of memory (https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/clopts.html).
Any ideas on this?

@joyeecheung
Copy link
Member

@paulrutter There is an experimental JS API of llnode https://github.com/nodejs/llnode/blob/master/JSAPI.md which allows you to install it as a normal C++ Node.js addon and use the JS API to access info in the core dump (it's not yet completed, of course).

There is also https://github.com/nodejs/node-report which can dump a summary when the process crashes, probably a bit closer to the Java heapdump experience. The problem with heapdumps is that they only include information from the VM's managed heap, but sometimes the cause may come from the native layer, so it's sometimes inevitable to dig into there.

@joyeecheung
Copy link
Member

Also, node-report is being integrated into core nodejs/node#22712 the current implementation allows you to use command line flags to configure when the report will be generated.

@paulrutter
Copy link
Author

Thanks, that's very helpful.
I understand that a heapdump alone sometimes might not be enough, although for our usecases it usually is (we run pluggable JS code which usually is the cause for mem leaks).

I'll take a look into node-report to see if that's helpful 👍

@mmarchini
Copy link
Contributor

Another useful tool is https://github.com/v8/sampling-heap-profiler, which shows allocation per function call. This tool is also more suited for production scenarios because taking heapdumps is expensive (high memory consumption) and slow, but it shouldn't be a problem if the process keeps memory usage around 80MB.

@paulrutter
Copy link
Author

Thanks, that's another one not known to me until now.
Memory usage for generating heapdumps is not an issue in case of OoM, because the process is lost anyway. For on-demand heapdumps; those are created from another process, so memory consumption is not a risk for the process being analyzed (in node-oom-heapdump, other modules like v8-profiler run in the same process).

@mhdawson
Copy link
Member

mhdawson commented Oct 4, 2018

@paulrutter please let me know your experiences with node-report. Would like to hear about what worked well as well as other information that might be useful in node-report.

In my session coming up at NodeConfEU one of the demo's included is using nore-report to help when you have an OOM.

@paulrutter
Copy link
Author

Hi @mhdawson,

I tried node-report yesterday (the fatalerror.js example included in the source), and although this is definitely helpful information in case of an OoM, i still miss information about what the heap contained at the time.
If a heapdump (v8::Isolate::GetCurrent()->GetHeapProfiler()->TakeHeapSnapshot();) was created as well, that would give more insight. Of course, as @joyeecheung mentioned, this only shows the VM managed heap and no native info.

@mhdawson
Copy link
Member

mhdawson commented Oct 5, 2018

@paulrutter taking a heap snapshot is also one of the core diagnostics capabilities that I believe should be in core. We have a PR working to get node-report into core itself (in a bit different form). After that I want to look at the best practice for getting a heapdump.

@richardlau at one point I had thought node-report was going to enable triggering a heap dump as well but maybe that was https://github.com/RuntimeTools/appmetrics instead.

You can get a heapdump with the https://www.npmjs.com/package/heapdump module and also through the inspector APIs (I've not had time to validate the later myself quite)

@paulrutter
Copy link
Author

Thanks!

For the latter, we now created our own module called node-oom-heapdump, which works very well for us. But it would be nice to have this functionality in the core like you mentioned. Just a matter of time, i guess?

I could help on getting a pull request for node-report, if that would help?

@richardlau
Copy link
Member

@mhdawson I have a feeling it was appmetrics and that ended up using the heapdump module.

@mhdawson
Copy link
Member

mhdawson commented Oct 5, 2018

@paulrutter you mean a PR to add heapdump trigger support in node-report or something else?

@joyeecheung
Copy link
Member

By the way, the heap snapshot doesn't seem to work on larger heaps: https://bugs.chromium.org/p/chromium/issues/detail?id=826697

@mhdawson
Copy link
Member

mhdawson commented Oct 5, 2018

@paulrutter if you are interested we might collaborate on the path of having a heapdump solution in core. I'm quite interested in that but have just not had time to make progress. One thing people have mentioned is that it is possible through the inspector API so first step is to understand that. If its sufficient then documenting it as part of the diagnostics group's best practices and if not making the case for what we think should be in core. Adding @gireeshpunathil as another interested party.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 5, 2018

also through the inspector APIs (I've not had time to validate the later myself quite)

@mhdawson That is certainly possible. This is the snippet I used to generate the heap snapshots for the screenshots in nodejs/node#23072

See snippet
'use strict';

const http2 = require('http2');
const inspector = require('inspector');
const session = new inspector.Session();
session.connect();
const fs = require('fs');

function createSnapshot() {
  let buf = '';
  session.on('HeapProfiler.addHeapSnapshotChunk', ({
    method, params: { chunk }
  }) => {
    console.log('addHeapSnapshotChunk', chunk.length);
    buf += chunk;
  });
  session.on('HeapProfiler.reportHeapSnapshotProgress', (progress) => {
    console.log('reportHeapSnapshotProgress', progress);
  });
  session.post('HeapProfiler.takeHeapSnapshot', {
    reportProgress: true
  }, () => {
    console.log('Writing snapshot');
    fs.writeFileSync('./heap.heapsnapshot', buf);
  });
}

const server = http2.createServer();
server.on('stream', (stream) => {
  stream.respondWithFile(__filename);
});
server.listen(0, () => {
  const client = http2.connect(`http://localhost:${server.address().port}`);
  const req = client.request();

  req.on('response', () => {
    createSnapshot();
  });

  req.resume();
  req.on('end', () => {
    client.close();
    server.close();
  });
  req.end();
});

@mhdawson
Copy link
Member

mhdawson commented Oct 5, 2018

@joyeecheung the key thing for me is that you should be able to trigger heapdumps without having change your code. What I thought was being suggested was to connect through the external inspector interface and request a heapdump. The potential issue with that is whether it is acceptable/practical to have that enabled/accessible security in production.

One other question, for your example above did you need to enable the inspector through a command line option or is simply requiring like you show enough?

@joyeecheung
Copy link
Member

@mhdawson Without code changes we will probably have to achieve this through certain IPC mechanisms, AFAIK user land tend to use signals for this (so does node-report).

The snippet above just needs to be run with the node executable without any flags to produce a heapdump.

@paulrutter
Copy link
Author

To be clear; we're talking about two separate things here:
1.creating a heapdump should be core functionality without having to depend on userland packages. I'm all in favor of this.
2. Creating a heapdump when an out of memory situation occurs, should be a core function as well (afaik), possibly behind a command line flag. This could use the functionality specified in bullet 1 to do so, with help of the SetOOMErrorHandler in V8.

I would be happy to collaborate on both topics. I could possibly get some time on this within the company i work for, because it benefits them as well.

Let me know if my summary checks out for you as well and if so, go from there.

@mmarchini
Copy link
Contributor

I like these ideas but I'm still concerned about performance issues and memory footprint while generating heap dumps. AFAIK, taking a heapdump of large heaps takes too much time, uses at least the same amount of memory already allocated for the heap and can't be done in a separate thread (e.g.: it is a blocking operation which will stop the event loop). These problems won't be a problem when dealing with smaller heaps, but most Node.js users will experience OOM only after they're using more than 1Gb (which is the default max-old-space-size setting).

I think we need to solve these problems before integrating this into core.

cc @hashseed @ofrobots @bmeurer any insights on this?

@paulrutter
Copy link
Author

That certainly is an issue for bullet 1, but not for 2. In case of OoM, blocking the event loop for a heapdump doesn't sound like an issue to me? The process is dead anyway.
Memory is a concern because it required roughly two times the configured heap size.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 6, 2018

1.creating a heapdump should be core functionality without having to depend on user land packages. I'm all in favor of this.

This is already possible in JS land through the inspector API. if the request is for something that directly calls into heap_profiler->TakeHeapSnapshot() without creating an inspector session, there is already an internal API internalBinding('heap_utils').createHeapDump()that does this for testing heap dumps, I would suggest to open an issue in the nodejs/node repo to request moving that to user land and see if collaborators are comfortable with that.

For bullet 2 I would perfer something more general that allows users to configure actions in OOM other than just taking an heapdump - but again we can only find out how comfortable collaborators feel about this with an issue in the core repo, since that's where the code will land.

@paulrutter
Copy link
Author

paulrutter commented Oct 7, 2018

Ok, that is clarifying! That indeed is what i mean for bullet 1. Just an API for creating heapdumps. Will test next week and create an issue to move that to a more public API, or document it somewhere.

For bullet 2, making it configurable would make it more flexibele, i agree. Maybe an ability to hook into SetOOMErrorHandler from JS code would then be enough? Something like process.on('fatal_error') or something similar?

@joyeecheung
Copy link
Member

joyeecheung commented Oct 7, 2018

or document it somewhere.

It’s an internal API for testing so I believe documenting it (without moving it to the public API surface) is not really on the table.

@paulrutter
Copy link
Author

I created nodejs/node#23328 for creating heapdumps in core.
Let's see what the responses are before filing an issue for bullet 2.

@paulrutter
Copy link
Author

And the follow up issue here: nodejs/node#27552

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants