New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks when using node-v10 and node-v12 #829
Comments
@artemp, @gabrielschulhof has done work to better handle cleanup, in particular for Worker threads. This change is in v13.x and higher not 12.x and 10.x. nodejs/node@53ca0b9 I think that could explain what you are seeing. In the non-worker case the 'leak' would only be present when the Node.js process was shutting down. That would be reported by valgrind but not be a problem in real-world deployments. In addition to fixing it for workers where it could have been a real-world issue, the change above also fixed it on shutdown. |
@mhdawson - looks like nodejs/node@53ca0b9 is the one! Assuming not, but are there any plans to backport these changes to node 10.x and/or 12.x? Cheers. |
@artemp, no plans to backport as in most cases it's not a real issue since Node.js is shutting down anyway and its been like that for addons since the beginning of time versus being a regression at some point. |
ok, thanks @mhdawson |
I'm having second thoughts on this issue, I think I closed it prematurely:) I have done more testing and I'm seeing unbound memory growth in long running processes which use AsyncWorker with node 10.
^ Contrary, it looks like a defect. The memory doesn't get released by GC over time and it leads to 'out-of-memory' condition in real life deployments. From my testing, it looks like node-addon-api AsyncWorker can't be used with node < 13 in production because of memory leaks. Both Node 10 and 12 are considered production ready : https://nodejs.org/en/about/releases/ If backporting is not an option, perhaps this should be documented somewhere? |
@artemp thanks for re-opening. I'm now wondering if you are running into nodejs/node#35777. Which versions of 12.x and 10.x are you seeing the issue with? If they are versions that have the regression related to 35777 can you try with an earlier version ? |
Although I don't think that made it into 10.x so maybe not related. Do you have a test case that when run with a small heap quickly shows it running out of memory? |
@mhdawson thanks for getting back.
I'm using
The leak is small and gets problematic only over time. Here is my attempt to show the issue using unmodified const runWorker = require('../build/Release/napi-asyncworker-example-native');
const argv = process.argv.slice(2);
function MemoryLeak(i, count) {
setTimeout(() => {
if (i == count)
{
console.log(process.memoryUsage());
console.log(require('v8').getHeapStatistics());
return;
}
let result = runWorker.runSimpleAsyncWorker(0.001, (err, result) => {
if (err) {
console.log("SimpleAsyncWorker returned an error: ", err);
} else {
console.log(i + " SimpleAsyncWorker returned '"+result+"'.");
}
MemoryLeak(++i, count);
});
}, 0.1);
}
function main(args)
{
console.log(args);
var count = Number(args[0]);
console.log("testing count=", count, "..");
let i = 0;
MemoryLeak(i, count);
console.log("done.");
}
main(argv); Run the same script with node-10 and node-15
After some time: Not quite running out or heap memory, but it shows that node-10 accumulate some unreferenced memory over time comparing to node-15.
I'm going to try reverting nodejs/node#35777 to see if it makes any differences. |
@artemp thanks for the info and trying those things out. I was asking about if we can get to an OOM, because if nodejs/node@53ca0b9 fixes the reported memory leaks at shutdown, it would seem that the issue was related to the finalizers not being run at shutdown (which nodejs/node@53ca0b9 fixed). That would mean there are Objects that are keeping the memory alive and if there is a bug that means they can never be collected eventually we should be able to force an OOM with a smaller head using --max-old-space-size and enough iterations. I guess I should have also asked if the case where you see the leak is using workers. In that case it would be expected and the right answer might be to document the limitation applies only when using workers which would be the case. |
Not having much luck triggering OOM with my simple script, yet.
yes ^. |
Without nodejs/node@53ca0b9 when using Node.js WorkerThreads, objects which have not been collected when the worker terminates would result in the leak you mentioned. So the situation I could understand is: You repeatedly create a worker thread which If we confirm that is the case, then I think updating the documentation to identify the potential issue for 10.x and 12.x and recommending people use a pool for WorkerThreads (which is already advisable for other reasons) as opposed to repeatedly creating a new one. |
@mhdawson - I didn't realise you were referring to https://nodejs.org/api/worker_threads.html (?) and if this is the case I'm not using Node.js WorkerThreads , sorry for the confusion. I think there are multiple issues which got conflated. I can now trigger OOMs reliably with node-v10 and diff --git a/napi-asyncworker-example/node-addon-api/src/SimpleAsyncWorker.cc b/napi-asyncworker-example/node-addon-api/src/SimpleAsyncWorker.cc
index ff47d2f..e92b4e4 100644
--- a/napi-asyncworker-example/node-addon-api/src/SimpleAsyncWorker.cc
+++ b/napi-asyncworker-example/node-addon-api/src/SimpleAsyncWorker.cc
@@ -6,7 +6,7 @@ SimpleAsyncWorker::SimpleAsyncWorker(Function& callback, int runTime)
: AsyncWorker(callback), runTime(runTime) {};
void SimpleAsyncWorker::Execute() {
- std::this_thread::sleep_for(std::chrono::seconds(runTime));
+ std::this_thread::sleep_for(std::chrono::milliseconds(runTime));
if (runTime == 4) SetError ("Oops! Failed after 'working' 4 seconds.");
}; and the script : const runWorker = require('../build/Release/napi-asyncworker-example-native');
const argv = process.argv.slice(2);
function MemoryLeak(i, count) {
if (i == count)
{
console.log(process.memoryUsage());
console.log(require('v8').getHeapStatistics());
return;
}
if (i%100 == 0)
console.log("AsyncWork:",i);
let result = runWorker.runSimpleAsyncWorker(10, (err, result) => {
if (err) {
console.log("SimpleAsyncWorker returned an error: ", err);
} else {
if (i%100 == 0) console.log(i + " SimpleAsyncWorker returned");
}
MemoryLeak(++i, count);
});
}
function main(args)
{
console.log(args);
var count = Number(args[0]);
console.log("testing count=", count, "..");
let i = 0;
MemoryLeak(i, count);
console.log("done.");
}
main(argv); node-v10.23.0/opt/node-v10.23.0/bin/node --trace-gc --max-old-space-size=4 ./test/MemoryLeak.js 100000
[5903:0x5608fe9b0dd0] 23 ms: Mark-sweep 2.5 (4.2) -> 2.1 (5.2) MB, 2.6 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure scavenge might not succeed
[5903:0x5608fe9b0dd0] 33 ms: Mark-sweep 2.6 (5.2) -> 2.5 (5.7) MB, 1.9 / 0.0 ms (average mu = 0.815, current mu = 0.815) allocation failure scavenge might not succeed
[5903:0x5608fe9b0dd0] 41 ms: Mark-sweep 3.1 (5.7) -> 2.9 (8.2) MB, 2.1 / 0.0 ms (average mu = 0.784, current mu = 0.746) allocation failure GC in old space requested
[5903:0x5608fe9b0dd0] 55 ms: Mark-sweep 3.5 (8.2) -> 3.3 (8.7) MB, 3.1 / 0.0 ms (average mu = 0.780, current mu = 0.777) allocation failure GC in old space requested
[ '100000' ]
testing count= 100000 ..
AsyncWork: 0
done.
0 SimpleAsyncWorker returned
AsyncWork: 100
100 SimpleAsyncWorker returned
AsyncWork: 200
200 SimpleAsyncWorker returned
AsyncWork: 300
300 SimpleAsyncWorker returned
AsyncWork: 400
400 SimpleAsyncWorker returned
AsyncWork: 500
500 SimpleAsyncWorker returned
AsyncWork: 600
600 SimpleAsyncWorker returned
AsyncWork: 700
700 SimpleAsyncWorker returned
[5903:0x5608fe9b0dd0] 8474 ms: Mark-sweep 3.7 (8.7) -> 3.4 (5.7) MB, 6.5 / 0.1 ms (average mu = 0.999, current mu = 0.999) allocation failure GC in old space requested
[5903:0x5608fe9b0dd0] 8477 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 2.0 / 0.0 ms (average mu = 0.998, current mu = 0.079) allocation failure GC in old space requested
[5903:0x5608fe9b0dd0] 8479 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 1.7 / 0.0 ms (average mu = 0.998, current mu = 0.155) last resort GC in old space requested
[5903:0x5608fe9b0dd0] 8481 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 2.0 / 0.0 ms (average mu = 0.996, current mu = 0.021) last resort GC in old space requested
<--- Last few GCs --->
[5903:0x5608fe9b0dd0] 8477 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 2.0 / 0.0 ms (average mu = 0.998, current mu = 0.079) allocation failure GC in old space requested
[5903:0x5608fe9b0dd0] 8479 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 1.7 / 0.0 ms (average mu = 0.998, current mu = 0.155) last resort GC in old space requested
[5903:0x5608fe9b0dd0] 8481 ms: Mark-sweep 3.4 (5.7) -> 3.4 (5.7) MB, 2.0 / 0.0 ms (average mu = 0.996, current mu = 0.021) last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x15df1d3dbf1d]
1: StubFrame [pc: 0x15df1d394e11]
2: StubFrame [pc: 0x15df1d408edf]
Security context: 0x0360fa99e6c1 <JSObject>
3: formatWithOptions [0x1a9af85d5bf9] [util.js:162] [bytecode=0x1a9af85831b9 offset=804](this=0x0f5175d44ef9 <Object map = 0x1536356b6189>,inspectOptions=0x3c9dcf732b51 <Object map = 0x1536356c8879>,f=0x3c9dcf74fdc1 <String[10]: AsyncWork:>)
4: arguments adaptor frame: 3->2
...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: 0x5608fc6eb594 node::Abort() [/opt/node-v10.23.0/bin/node]
2: 0x5608fc6eb5e2 [/opt/node-v10.23.0/bin/node]
3: 0x5608fc918cba v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/opt/node-v10.23.0/bin/node]
4: 0x5608fc918f85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/opt/node-v10.23.0/bin/node]
5: 0x5608fcd46bf6 [/opt/node-v10.23.0/bin/node]
6: 0x5608fcd58f19 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/opt/node-v10.23.0/bin/node]
7: 0x5608fcd1c5ab v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [/opt/node-v10.23.0/bin/node]
8: 0x5608fcd2fb11 v8::internal::Factory::SetNumberStringCache(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::String>) [/opt/node-v10.23.0/bin/node]
9: 0x5608fcd2fc01 v8::internal::Factory::NumberToString(v8::internal::Handle<v8::internal::Object>, bool) [/opt/node-v10.23.0/bin/node]
10: 0x5608fd01260c v8::internal::Runtime_NumberToStringSkipCache(int, v8::internal::Object**, v8::internal::Isolate*) [/opt/node-v10.23.0/bin/node]
11: 0x15df1d3dbf1d
Aborted (core dumped) The same script runs happily under node-v15.0.1 |
@artemp this has been on my TODO list to get back to this, but other priorities have gotten in the way. Just an FYI that it is still on my list. |
@mhdawson - no probs, thanks. |
I recreated the OOM's with node-v10.23.0. I then tried with node-v12.18.3 and it seems to run fine. |
I then tried increasing the heap from 4MB to 6MB when running with 10.x (4MB is pretty small) node --trace-gc --max-old-space-size=6 ./test/MemoryLeak.js 100000 |
With the larger heap (6MB) it seems to run ok with 10.x and from the gc output it seems to stabilise at: [18680:0x4290900] 123269 ms: Mark-sweep 4.9 (7.2) -> 3.9 (7.2) MB, 2.0 / 0.3 ms (average mu = 1.000, current mu = 1.000) allocation failure scavenge might not succeed where the heap goes up to 4.9MB and then after each GC goes back down to 3.9MB |
I then looked at the Resident Set Size and Virtual Memory Size with: ps -eo vsz,rss,comm |grep node The Virtual memory size seems to stay consistent, while the RSS increases and then drops down after each GC: 598556 33896 node
user1@user1-VirtualBox:~/newpull/io.js$ ps -eo vsz,rss,comm |grep node
598556 33896 node
user1@user1-VirtualBox:~/newpull/io.js$ ps -eo vsz,rss,comm |grep node
598556 33112 node
user1@user1-VirtualBox:~/newpull/io.js$ ps -eo vsz,rss,comm |grep node
598556 33112 node |
I let it run for >15 mins and the GC/heap size still seems stable: user1@user1-VirtualBox:~/newpull/io.js$ ps -p 18680 -o etime
ELAPSED
18:35 [18680:0x4290900] 985718 ms: Mark-sweep 4.9 (7.2) -> 3.9 (7.2) MB, 2.8 / 0.3 ms (average mu = 1.000, current mu = 1.000) allocation failure scavenge might not succeed as does the virtual memory and resident set size:
which is still in the same range as after the first 5 mins. And in fact while I was typing this the test actually finished. |
I'll start it up again and let it run for a bit longer to confirm the resident set size is not increasing but I think the heap was just a bit too small for 10.x as with a slightly larger heap it seems stable and I don't see the memory reported by the os increasing. |
At 300,000 iterations I now see: [8068:0x27b6900] 3207525 ms: Mark-sweep 5.0 (7.2) -> 4.0 (7.2) MB, 2.9 / 0.2 ms (average mu = 1.000, current mu = 1.000) allocation failure scavenge might not succeed for a 100k increase in heap size(and something similar in the RSS size), but if you take 100k/300k iterations (since it was at 3.9 near the start) that is 0.33 bytes per iteration so I don't think it's related to the iterations (you can't leak 0.33 bytes at a time :)) just the memory used by the GC can vary a bit. So after running an hour I don't think I see anything that would point to a memory leak. |
And just to be sure I ran with 10.x: user1@user1-VirtualBox:~/napi-mem/node-addon-examples/napi-asyncworker-example/node-addon-api$ node --version |
@mhdawson thanks for looking into this. |
@artemp, For 10.x, provided you give it a 6MB or larger heap it looks to me like 10.x is running ok. My summary is that from what I can see with the test case provided 10, 12,14 all seem to be ok, the only difference being if you set a heap size of less than 6MB which is extremely small. |
@mhdawson OK, thanks again for investigating this issue, closing. |
Memory leak(s)
To reproduce
https://github.com/nodejs/node-addon-examples/tree/master/napi-asyncworker-example/node-addon-api
then following script (MemoryLeak.js)
node-v10.23.0
+node-addon-api-v3.0.2
looks interesting ^
Amount of
definitely lost
increases as number of iterations increases e.g callingMemoryLeak(i, 100)
^^node-v14.11.0
+node-addon-api-v3.0.2
With node >=13 (tested with node-v13.12.0 and node-v14.11.0) there are constant results from memcheck no matter how many iterations as above.
It looks like there are memory leaks related to node-addon-api in node-10 and/or corresponding napi implementation ( and also in node-v12.16.1)
Is this known behaviour? Did anyone came across similar issues?
The text was updated successfully, but these errors were encountered: