Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when calling fs.write() in long synchronous loop #11289

Closed
aalexgabi opened this issue Feb 10, 2017 · 13 comments
Closed

Memory leak when calling fs.write() in long synchronous loop #11289

aalexgabi opened this issue Feb 10, 2017 · 13 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. question Issues that look for answers.

Comments

@aalexgabi
Copy link

aalexgabi commented Feb 10, 2017

I think I found a memory leak when using fs.write() from a long synchronous loop. Here is the script that I used to reproduce:

var fs = require('fs')

var f = fs.openSync('/tmp/toto', 'w')
var line = 'hello\n'

for (var i = 0; i < 2000000; i++) {
  fs.write(f, line)
}

console.log('finished sync loop')

fs.fsync(f, function(){
  fs.close(f, function(){
    global.gc()
    console.log('closed')
  })
})


// Avoid script exit
setTimeout(function(){}, 600000) 

With node v0.12.9:

% node --expose-gc t4.js
finished sync loop
closed
^C
% 

After the process closed the file, there is 1.1 GB used by the process:
image

The file contains all the lines:

% wc -l toto 
2000000 toto

The file has 12 MB so I would eventually expect the node process to take around 70 MB of memory given that it takes 17 with an empty vm:

% ls -lh toto 
-rw-rw-r-- 1 alex alex 12M Feb 10 15:59 toto
% 

If I don't call the gc manually I get:
image

With node v7.5.0:

% node --expose-gc t4.js

<--- Last few GCs --->

[12644:0x2dea9c0]    50672 ms: Mark-sweep 1403.9 (1473.8) -> 1403.9 (1473.8) MB, 1299.8 / 1.1 ms  allocation failure GC in old space requested
[12644:0x2dea9c0]    51970 ms: Mark-sweep 1403.9 (1473.8) -> 1403.9 (1442.8) MB, 1297.4 / 2.2 ms  last resort gc 
[12644:0x2dea9c0]    53382 ms: Mark-sweep 1403.9 (1442.8) -> 1403.9 (1442.8) MB, 1411.4 / 1.1 ms  last resort gc 


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x27bc2abc0d39 <JS Object>
    1: write [fs.js:~669] [pc=0x2e9d77e4b286](this=0x1cdf3f4ab721 <an Object with map 0x37f0aa419701>,fd=9,buffer=0x15b7a55fff89 <String[19]\: 0.6831274126925666\n>,offset=0x27bc2ab04311 <undefined>,length=0x27bc2ab04311 <undefined>,position=0x27bc2ab04311 <undefined>,callback=0x27bc2ab04311 <undefined>)
    2: arguments adaptor frame: 2->6
    3: /* anonymous */ [/tmp/t4.js:~1] [pc=0x2e9d77e50...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x126389c [node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
 5: v8::internal::Factory::NewRawOneByteString(int, v8::internal::PretenureFlag) [node]
 6: v8::internal::String::SlowFlatten(v8::internal::Handle<v8::internal::ConsString>, v8::internal::PretenureFlag) [node]
 7: v8::internal::String::Flatten(v8::internal::Handle<v8::internal::String>, v8::internal::PretenureFlag) [node]
 8: v8::String::WriteUtf8(char*, int, int*, int) const [node]
 9: node::StringBytes::Write(v8::Isolate*, char*, unsigned long, v8::Local<v8::Value>, node::encoding, int*) [node]
10: 0x1288fe9 [node]
11: 0x2e9d77e47b2f
zsh: abort (core dumped)  node --expose-gc t4.js

I have also noticed that v7.5.0 is several times slower when writing to file.

@mscdex
Copy link
Contributor

mscdex commented Feb 10, 2017

I think this is probably expected. You're continuously queuing up write requests to the thread pool, which has a fixed number of threads available.

If you write synchronously (fs.writeSync()) instead, or you limit the number of outstanding write requests (e.g. equal to the size of the thread pool -- which is 4 by default IIRC), you should not run into this problem.

@mscdex mscdex added fs Issues and PRs related to the fs subsystem / file system. question Issues that look for answers. labels Feb 10, 2017
@aalexgabi
Copy link
Author

aalexgabi commented Feb 10, 2017

@mscdex I understand that some structures may accumulate in memory as long as the buffer is not flushed but note that the memory leak is still present AFTER everything has been wrote to the file and the file was closed.

From:

Note that it is unsafe to use fs.write multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream is strongly recommended.

What unsafe means here? Should the documentation be more specific?

@bnoordhuis
Copy link
Member

Quite possibly a variation on #8871 (comment) (summary: memory fragmentation, not memory leak) but could also be #11077.

@Trott
Copy link
Member

Trott commented Jul 26, 2017

Is this close-able at this point or should this remain open?

@bnoordhuis
Copy link
Member

I don't think there is anything actionable. I'll close it out. Thanks for the bug report, @aalexgabi.

@jorangreef
Copy link
Contributor

I think it's actionable if it's an issue with fragmentation.

@aalexgabi's gist should work with Node. Sure, it's pushing the edge with so many temporary buffers but it's a reasonable stress test. Node should be able to handle these kinds of edge cases without being fragile.

This could be fixed through a better allocator.

@bnoordhuis
Copy link
Member

bnoordhuis commented Jul 26, 2017

There are two things here:

  1. The process runs out of memory because it queues up 2 million fs.write() requests.
  2. RSS doesn't go down after the test completes.

Point 1 is not a bug but a peculiarity: since the test case doesn't pass a callback to fs.write() (which is deprecated), node.js calls process.emitWarning() 2 million times, exacerbated by 2 million calls to Error.captureStackTrace() that cause running time and memory usage to explode. If you add a callback, the test completes in seconds flat.

(Should process.emitWarning() do rate limiting? Maybe, maybe not; that's a complexity/commonality trade-off. I don't expect you would normally hit such pathological behavior in a real-world program.)

A non-deprecated calling pattern shows #8871 (comment), memory fragmentation. You say "could be fixed through a better allocator", I say that's not our bug to fix. Take it up with glibc or experiment with MALLOC_MMAP_THRESHOLD_ and friends.

EDIT: To be clear, a custom allocator is not out of the question but for other reasons than as a workaround for a glibc deficiency.

@jorangreef
Copy link
Contributor

  1. RSS doesn't go down after the test completes.

Yes, this is the thing.

Take it up with glibc or experiment with MALLOC_MMAP_THRESHOLD_ and friends.

Perhaps jemalloc would be another way forward.

EDIT: To be clear, a custom allocator is not out of the question but for other reasons than as a workaround for a glibc deficiency.

Antirez has some comments on moving to jemalloc from glibc a few years back: http://oldblog.antirez.com/post/everything-about-redis-24.html).

The jemalloc affair is one of our most fortunate use of external code ever. If
you used to follow the Redis developments you know I'm not exactly the kind of
guy excited to link some big project to Redis without some huge gain. We don't
use libevent, our data structures are implemented in small .c files, and so
forth. 

But an allocator is a serious thing. Since we introduced the specially encoded
data types Redis started suffering from fragmentation. We tried different things
to fix the problem, but basically the Linux default allocator in glibc sucks
really, really hard. 

Including jemalloc inside of Redis (no need to have it installed in your
computer, just download the Redis tarball as usually and type make) was a huge
win. Every single case of fragmentation in real world systems was fixed by this
change, and also the amount of memory used dropped a bit.

Also, Jason Evans from 2011 (https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919):

The relation between active memory and RAM usage must be consistent. In other
words, practical bounds on allocator-induced fragmentation are critical.
Consider that if fragmentation causes RAM usage to increase by 1 GiB per day, an
application that is designed to leave only a little head room will fail within
days.

@bnoordhuis
Copy link
Member

I'm not rabidly opposed to jemalloc (libc interop issues aside) but the "other reasons" I alluded to are exploiting known facts about allocations; e.g., that most memory is allocated and freed on the main thread (no synchronization needed), or that memory for a fs request should come from a different bin because it stays around longer than the temp storage used for flattening a string. That kind of thing.

@aalexgabi
Copy link
Author

Adding a callback:

fs.write(f, line, function() {})

Avoids the crash but the leak/fragmentation is still there:
image

@bnoordhuis
Copy link
Member

@aalexgabi Yes, I mentioned that in #11289 (comment).

@machinetherapist
Copy link

i have same issue writeFileSync

@aalexgabi
Copy link
Author

To be honest I'm disappointed with the closure of this issue.

The first imagined "real world" example that comes to mind it's a logger used by a service with a highly variable workload. This means that during intense workload if the cpu is faster than the disk for a given period of time, all the memory queued log write calls will generate leaks that will never be recovered in the lifetime of the service.

In a way this means that node applications are "required" to be restarted/killed from time to time if there is no way to implement a back-pressure mechanism for handling IO writes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. question Issues that look for answers.
Projects
None yet
Development

No branches or pull requests

6 participants