recursive option #24

schnittstabil · 2016-03-21T10:16:32Z

A recursive option for sindresorhus/cpy#10 (comment).

The procedure would be as follows:

glob patterns as usual
for each path in the result, execute an additional glob: var paths = glob(path + '{,/**}', opts)

if path is a file, then paths equals to [path]
if path is a directory, then paths contains path and all its files and folders with respect to opts (especially: ignore, dot and follow)

UltCombo · 2016-03-21T17:30:48Z

I'm not sure if I have the necessary context to review this PR, but I'm wondering: for the use case you have presented (cpy), wouldn't it be viable to just append {,/**} to the end of all glob patterns?

schnittstabil · 2016-03-21T19:27:10Z

@UltCombo The results may be the same, the runtimes are not, e.g.:

globby(['node_modules', '!node_modules'], {recursive: true})
// vs
globby(['node_modules{,/**}', '!node_modules{,/**}'])

@sindresorhus, @UltCombo About the use cases: I'm struggling with myself:
As dot defaults to false, should globby(['node_modules'], {recursive: true}) also include node_modules/.bin or not - what kind of behavior is expected by most users?

An alternative:

// typeof(recursive): boolean or object

globby(['node_modules'], {
  recursive: {
    dot: true,
    …
  }
})

UltCombo · 2016-03-21T19:56:00Z

Note that negation globs are always dot: true, this is a limitation of node-glob's ignore patterns (doc). That said, using the patterns ['node_modules{,/**}', '!node_modules{,/**}'] _should_ not traverse the node_modules directory (I'm pretty positive, but we should test to be sure).

And what I'm trying to say is, it would be nice if we could avoid adding so much complexity to this module. If the results are actually the same, and the perf is the same or better, I'd suggest to have the recursive: true option simply map the input globs appending ,{/**} to them. Then again, this might be an unnecessary abstraction that the caller (cpy) could handle on its own.

Apart from the complexity, another important point is that your code is instantiating a new glob object for each matched path, which likely adds a huge perf overhead to parse all these new generated glob patterns. Also, these new glob patterns are not sharing their caches (statCache, symlinks, realpathCache) which may result in a considerable perf loss depending on the use case. From the docs:

If you are running many glob operations, you can pass a Glob object as the options argument to a subsequent operation to shortcut some stat and readdir calls. At the very least, you may pass in shared symlinks, statCache, realpathCache, and cache options, so that parallel glob operations will be sped up by sharing information about the filesystem.

Although the (lack of) cache is easy to fix, I believe it would still not be worth to spawn a new glob instance for each matched path. Perhaps it is worth trying to implement something like the {,/**} logic I've mentioned above and compare the perf of both approaches.

Another general point about this feature: Did you consider "recursive" copies? I mean, copying a directory to a nested directory, e.g. if you copy the directory /a to /a/b/c, will it result in an infinite copy loop? Maybe it is worth trying to prevent that.
(Not really relevant to this repository, but keep it in mind when preparing the PR to cpy)

schnittstabil · 2016-03-21T20:37:15Z

['node_modules{,/**}', '!node_modules{,/**}'] should not traverse the node_modules directory (I'm pretty positive, but we should test to be sure).

To be honest, I have only tested this:

glob('node_modules', {ignore:[ 'node_modules' ]}, …)
// vs
glob('node_modules{,/**}', {ignore:[ 'node_modules{,/**}' ]}, …)

The first one was more than 7 times faster than the second (in the globby directory) - however I don't know if node_modules was traversed 😏

important point is that your code is instantiating a new glob object for each matched path,

That is true, but I didn't came up with a better, readable idea - In addition, the nested Promise.all sucks too 😒

Did you consider "recursive" copies?

Thank you, yes I did. There are also some issues with mutually recursively linked directories to handle. Therefore I'm planning to determine the source files first, and starting to copy after.

UltCombo · 2016-03-21T20:49:00Z

The first one was more than 7 times faster than the second (in the globby directory) - however I don't know if node_modules was traversed 😏

Well, that relative discrepancy is only useful when you pair it with the total execution time.

If the total execution time is just some nano/microseconds, then it is perfectly normal that the second pattern takes a few microseconds more to be parsed into a regex by the underlying minimatch library.

If the total execution times actually takes more than a few seconds, then it really means there is a perf issue with those patterns.

schnittstabil · 2016-03-22T05:57:54Z

Well, that relative discrepancy is only useful when you pair it with the total execution time.

You are very hard to convince, I like that - but I would not have wrote it, if it was negligible 😏

# install glob and some big dependencies like ava

$ time node -e "require('glob')('node_modules', {ignore:[ 'node_modules' ]}, console.log);"
null []

real    0m0.238s
user    0m0.200s
sys 0m0.028s

$ time node -e "require('glob')('node_modules{,/**}', {ignore:[ 'node_modules{,/**}' ]}, console.log);"
null []

real    0m4.020s
user    0m3.832s
sys 0m0.552s

Apart from that, I think a recursive option is not necessarily need anymore: sindresorhus/cpy#10 (comment)

UltCombo · 2016-03-22T06:14:43Z

You are very hard to convince, I like that - but I would not have wrote it, if it was negligible 😏

Hahah yeah, I like to be throughout with my reviews. Furthermore, I'm currently working 11h+ a day, so I'm saving time by offloading the burden of proof to you. 😛

Indeed, looks like {,/**} causes perf issues as well. That's odd, IIRC node-glob applies special treatment to ignore globs ending with /** as to not traverse inside the matched directories. Perhaps this special treatment is only applied when the pattern literally ends with /** (that is, {,/**} may not be recognized as valid for the optimization path in the current logic).

I'd like to see bench tests for glob('node_modules/**', {ignore:[ 'node_modules/**' ]}) and glob('node_modules{,/**}', {ignore:[ 'node_modules', 'node_modules/**' ]}), I believe both of these should get into the fast path. I think I should have more free time towards the end of the week, so I should be able to run such tests myself by then.

Well, in any case, if you'd like to land this PR sooner, feel free to work with other reviewers here. 😉

schnittstabil · 2016-03-22T06:24:29Z

I believe both of these should get into the fast path.

You are right, ignore:[ 'node_modules', 'node_modules/**' ] is much faster.

schnittstabil · 2016-03-30T16:38:52Z

Thank you, @UltCombo, you put me on the right track. It turned out that a recursive option wouldn't help much though.

UltCombo · 2016-03-30T17:05:10Z

Oh, nice!
Sorry for the delay, this was still in my list. Good to know you have found a better approach. 😃
Let me know if you need help reviewing this implementation in cpy or something else.

recursive option

a07ec37

schnittstabil mentioned this pull request Mar 22, 2016

Error when glob matches directory sindresorhus/cpy#10

Open

schnittstabil closed this Mar 30, 2016

sindresorhus mentioned this pull request May 18, 2016

#27

Closed

chopfitzroy mentioned this pull request Mar 26, 2017

Recursive glob patterns xojs/xo#201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recursive option #24

recursive option #24

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 22, 2016

UltCombo commented Mar 22, 2016

schnittstabil commented Mar 22, 2016

schnittstabil commented Mar 30, 2016

UltCombo commented Mar 30, 2016

recursive option #24

recursive option #24

Conversation

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 21, 2016

UltCombo commented Mar 21, 2016

schnittstabil commented Mar 22, 2016

UltCombo commented Mar 22, 2016

schnittstabil commented Mar 22, 2016

schnittstabil commented Mar 30, 2016

UltCombo commented Mar 30, 2016