Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some files are missing when doing glob #28

Open
chenxsan opened this issue Sep 1, 2018 · 10 comments
Open

Some files are missing when doing glob #28

chenxsan opened this issue Sep 1, 2018 · 10 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@chenxsan
Copy link

chenxsan commented Sep 1, 2018

I have this code:

              const searchPattern = '{post,page}/**/*.{md}'
              glob(searchPattern, {
                cwd: content,
                absolute: true
              })
                .then(files => {
                  console.log(files.length)
                  callback(null, files)
                })
                .catch(err => callback(err))

And I would expect the files for {post,page}/**/*.{md} = {post}/**/*.{md} + {page}/**/*.{md}, but it's not in my case.

I have 2 files for {page}/**/*.{md}, and 53 files for {post}/**/*.{md}, but only 33 for {post,page}/**/*.{md}. Am I doing something wrong here? The searchPattern just works fine under node-glob, fast-glob.

@chenxsan
Copy link
Author

chenxsan commented Sep 1, 2018

Here's the one causing problem https://github.com/terkelg/tiny-glob/blob/master/index.js#L34, I can remove it, and everything works as expected in my case. Also, it won't fail any tests.

@chenxsan
Copy link
Author

chenxsan commented Sep 1, 2018

I just put up a failing test here chenxsan@18f54bc

@chenxsan chenxsan mentioned this issue Sep 1, 2018
@terkelg
Copy link
Owner

terkelg commented Sep 2, 2018

Hi Sam! Thanks for having a look at the issue. Does it effect the benchmarks when you remove that line? It can also be a problem with the regex coming from globrex. Can you print the regex and file and test them?

The idea is that every path segment (aka dir name) is checked before tiny-glob starts crawling that directory. When globrex convert a glob it also break it into smaller regex segments for each folder/path segment. This is done so tiny-glob can check each directory and avoid spending time crawling unnecessary folders that never will result in any matches anyway. I suspect the regex for the glob {post,page} could be wrong.

@chenxsan
Copy link
Author

chenxsan commented Sep 3, 2018

Here's benchmark result after I removed that line:

glob x 13,438 ops/sec ±3.05% (83 runs sampled)
fast-glob x 25,485 ops/sec ±5.20% (76 runs sampled)
tiny-glob x 55,162 ops/sec ±6.97% (55 runs sampled)
Fastest is tiny-glob
┌───────────┬─────────────────────────┬────────────┬────────────────┐
│ Name      │ Mean time               │ Ops/sec    │ Diff           │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ glob      │ 0.00007441320916659413  │ 13,438.474 │ N/A            │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ fast-glob │ 0.00003923935167426461  │ 25,484.621 │ 89.64% faster  │
├───────────┼─────────────────────────┼────────────┼────────────────┤
│ tiny-glob │ 0.000018128419620750812 │ 55,162.006 │ 116.45% faster │
└───────────┴─────────────────────────┴────────────┴────────────────┘

@chenxsan
Copy link
Author

chenxsan commented Sep 3, 2018

Here's the lexer variable:

{ regex: /^(post|page)\/((?:[^\/]*(?:\/|$))*)([^\/]*)\.md$/,
  segments: [ /^(post|page)$/, /^((?:[^\/]*(?:\/|$))*)$/, /^([^\/]*)\.md$/ ],
  globstar: '/^((?:[^\\/]*(?:\\/|$))*)$/' }

And part of my directory structure:

$ tree ./post

image
All those .md files right under post are included while others inside subdirectory of post are filtered out.

So I just added a console.log(rgx, file) right before if (rgx && !rgx.test(file)) continue;, here's the printed result:

/^(post|page)$/ 'draft'
/^(post|page)$/ 'page'
/^((?:[^\/]*(?:\/|$))*)$/ 'about'
/^(post|page)$/ 'post'
/^([^\/]*)\.md$/ 'firefox-os'
/^([^\/]*)\.md$/ 'github-pages-custom-domain'
/^([^\/]*)\.md$/ 'markdown-and-table'
/^([^\/]*)\.md$/ 'srcset and sizes'
/^([^\/]*)\.md$/ 'telegram-scam-bitcoin'
...
...

Those are folders right under post, they should be walked into too. So the problem here might origin from the level value?

@terkelg
Copy link
Owner

terkelg commented Sep 3, 2018

What's going on with the directory names? Can you post the non-escaped strings for some of them?

@chenxsan
Copy link
Author

chenxsan commented Sep 3, 2018

Those're chinese characters.

image

But I don't think it matters. Folders with english names like firefox-os are ignored by glob too.

@chenxsan
Copy link
Author

chenxsan commented Sep 3, 2018

@terkelg Thought it's really difficult for you to understand my situation, I just created a new repo here https://github.com/chenxsan/tiny-glob-demo.

Please run npm install to install deps then run node index.js to check the results.

@terkelg
Copy link
Owner

terkelg commented Sep 3, 2018

Thanks a lot @chenxsan. I'll have a look at this when I get some spare time. I appreciate the help and information you provided

@pavelloz
Copy link

pavelloz commented Jul 29, 2019

Its possible that i have the same problem, but in different form.

I prepared repo with test case: git clone https://github.com/pavelloz/tg-testcase && npm i && node test.js
https://github.com/pavelloz/tg-testcase/

Shortcut:

Structure:

tinyglob-testcase|master ⇒ tree modules 
modules
└── test
    ├── private
    │   └── views
    │       └── pages
    │           └── mypage.liquid
    └── public
        └── views
            ├── pages
            │   └── page.liquid
            └── partials
                ├── data
                │   ├── one.liquid
                │   └── two.json
                └── hello.liquid

9 directories, 5 files

Code:

const tg = require('tiny-glob');

tg('**', {
  cwd: 'modules/test',
  filesOnly: true
}).then(files => {
  console.log('Non-filtered.', files.length);
  console.log(files);
});

tg('{private,public}/**', {
  cwd: 'modules/test',
  filesOnly: true
}).then(files => {
  console.log('Filtered by private/public (broken)', files.length);
  console.log(files);
});


tg('**/{private,public}/**', {
  cwd: 'modules/test',
  filesOnly: true
}).then(files => {
  console.log('Filtered, with workaround/hack applied.', files.length);
  console.log(files);
});

Results

tinyglob-testcase|master ⇒ node test.js 
Filtered by private/public (broken) 1
[ 'private/views/pages/mypage.liquid' ]
Non-filtered. 5
[
  'private/views/pages/mypage.liquid',
  'public/views/pages/page.liquid',
  'public/views/partials/data/one.liquid',
  'public/views/partials/data/two.json',
  'public/views/partials/hello.liquid'
]
Filtered, with workaround/hack applied. 5
[
  'private/views/pages/mypage.liquid',
  'public/views/pages/page.liquid',
  'public/views/partials/data/one.liquid',
  'public/views/partials/data/two.json',
  'public/views/partials/hello.liquid'
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants