Replaced glob with readdir-glob to be memory efficient #433

Yqnn · 2020-07-22T16:20:11Z

As described in the following issue: #422, node-glob is not designed to handle a huge amount of files: it requires a quantify of memory that is proportional to the number of matched files.

Why ? Because, it lists only the folders that appears in the pattern, and it has to memorise all the found files to ensure a same file is not emitted twice.
It makes sense to proceed like this when the pattern matches only a little proportion of the filesystem.
But as it's not a common use case when creating an archive, it would be more efficient to list all the files, and then check if they match the given pattern.

Advantage:

memory consumptions is fixed, no matter the number of matched files
it's faster when the proportion of matching files is high

Drawback:

absolute glob patterns are not supported: archiver.glob('*.txt',{cwd: '/tmp'}) has to be used instead of archiver.glob('/tmp/*.txt')
it's slower when few files matches in a big filesystem

The current PR implements this approach by replacing glob with readdir-glob that is memory-efficient.
It also pauses the glob stream when archiving is on-going, to keep the memory usage stable.

Maybe it would be better to offer this as a new option, or only to replace the directory() function.
Feel free to give feedback :)

…addir stream is paused when archiving is on-going

melitus · 2020-07-23T08:17:09Z

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

Yqnn · 2020-07-23T08:51:23Z

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

Nop, if you use append() the library has to remember all the appended files, so it's not possible to make it memory efficient in that case.
You have to put in place a throttling mechanism on your side. I guess you could check if archiver._queue.length is small enough before appending new files.

melitus · 2020-07-23T08:53:36Z

@Yqnn Thanks. But is there any alternative to append(). I am passing buffer and filename to append()

Yqnn force-pushed the readdirp branch from af806a1 to 6b2a90d Compare July 22, 2020 16:32

Replaced glob with readdir-glob to be memory efficient, and ensure re…

b31b705

…addir stream is paused when archiving is on-going

Yqnn force-pushed the readdirp branch from 6b2a90d to b31b705 Compare July 22, 2020 16:35

ctalkington approved these changes Jul 23, 2020

View reviewed changes

ctalkington merged commit a4c4507 into archiverjs:master Jul 23, 2020

hyoo mentioned this pull request Sep 22, 2020

.glob() with options.cwd is not working at 5.x #455

Closed

j0k3r mentioned this pull request Mar 18, 2021

Update deps to latest version nfriedly/node-bestzip#43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced glob with readdir-glob to be memory efficient #433

Replaced glob with readdir-glob to be memory efficient #433

Yqnn commented Jul 22, 2020

melitus commented Jul 23, 2020

Yqnn commented Jul 23, 2020

melitus commented Jul 23, 2020

Replaced glob with readdir-glob to be memory efficient #433

Replaced glob with readdir-glob to be memory efficient #433

Conversation

Yqnn commented Jul 22, 2020

melitus commented Jul 23, 2020

Yqnn commented Jul 23, 2020

melitus commented Jul 23, 2020