Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🎸 maxMatches to limit matches and exit early #238

Closed
wants to merge 5 commits into from

Conversation

joscha
Copy link

@joscha joscha commented Nov 5, 2019

What is the purpose of this pull request?

This introduces a new option maxMatches which can be used to limit the
number of matches that fast-glob returns.

What changes did you make? (Give an overview)

  • Added a maxMatches option
  • Added early exits to the stream and sync interface

Closes: #225

This introduces a new option `maxMatches` which can be used to limit the
number of matches that fast-glob returns.

Closes: mrmlnc#225
@@ -165,6 +172,9 @@ export default class Settings {
public readonly globstar: boolean = this._getValue(this._options.globstar, true);
public readonly ignore: Pattern[] = this._getValue(this._options.ignore, [] as Pattern[]);
public readonly markDirectories: boolean = this._getValue(this._options.markDirectories, false);
// If 0 or negative maxMatches is given, we revert to infinite matches
// eslint-disable-next-line @typescript-eslint/strict-boolean-expressions
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this if I:

  • ignore negative and zero maxMatches
  • introduce a new class method that can bound maxMatches between 1 and Infinity
  • assert maxMatches to be gte 1 in the constructor

let me know what's preferred :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the first option looks good to me because this logic is already exist for the deep and concurrency options.


stream._write = (index: number, _enc, done) => {
if (options.maxMatches === matches) {
// This is not ideal because we are still passing patterns to write
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help would be appreciated here @mrmlnc - I think the current solution is okayish, however ideally we would backpressure the stream in for (let i = 0; i < filepaths.length; i++) { based on the async stream.write task. Happy to follow another route as well if you have ideas.

@mrmlnc mrmlnc self-requested a review November 6, 2019 12:09
@mrmlnc mrmlnc self-assigned this Nov 6, 2019
Copy link
Owner

@mrmlnc mrmlnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues here that needs to work through.

If you abstract from fast-glob, the @nodelib/fs.walk package is not the solution to your problem? The fast-glob package built on this package. You can write your own shared filter.

@@ -26,6 +27,9 @@ export default class ReaderSync extends Reader<Entry[]> {
}

entries.push(entry);
if (options.maxMatches === ++matches) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is an incorrect place to filter by the number of entries found. The main problem is that only static patterns are processed at this place, which, by their nature (read as filepath), will always be in a single instance. There are two types of patterns in the fast-glob package: static and dynamic.

So, the most correct place to filter the entries — filters. You can use the unique filter as example. The logic is simple — you need to stop reading the directory in depth, as soon as the number of found entries exceeds a specified number in the maxMatches option. But even in this case we have a problem — we need to synchronize the number of found entries between filters (deep — directories, entries — directories, symbolic links and files). Right now I don't see a good solution for this problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. It feels like the option fits the fast-glob package, but do you suggest I use nodelib/fs-walk instead?

Copy link
Owner

@mrmlnc mrmlnc Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your problem correctly (find the first file with some extension), then this package (@nodelib/fs.walk) seems to suit you more (just add filter function with includes). If you need patterns, then, yes, here you need fast-glob.

The filters I mentioned above need to be updated in fast-glob.

it('is ignored if less or equal than 1', (done) => {
const reader = getReader();
const readerOptions = getReaderOptions({
maxMatches: -1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need check here Infinity instead of -1, because -1 will be converted to the Infinity in the Settings.

it('is ignored if less or equal than 1', () => {
const reader = getReader();
const readerOptions = getReaderOptions({
maxMatches: -1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same about -1 and Infinity.

@@ -165,6 +172,9 @@ export default class Settings {
public readonly globstar: boolean = this._getValue(this._options.globstar, true);
public readonly ignore: Pattern[] = this._getValue(this._options.ignore, [] as Pattern[]);
public readonly markDirectories: boolean = this._getValue(this._options.markDirectories, false);
// If 0 or negative maxMatches is given, we revert to infinite matches
// eslint-disable-next-line @typescript-eslint/strict-boolean-expressions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the first option looks good to me because this logic is already exist for the deep and concurrency options.

@@ -37,6 +49,10 @@ export default class ReaderStream extends Reader<NodeJS.ReadableStream> {
};

for (let i = 0; i < filepaths.length; i++) {
if (stream.writableEnded) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writableEnded property exist in the Stream only start with Node.js 12.9.0+. The fast-glob package supports Node.js 8+. This works now because writableEnded has undefined value.

import * as smoke from './smoke';

smoke.suite('Smoke → maxMatches', [
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this test does not work as expected. This behaviour related to the comment about wrong place for filtering.

You can add debug: true to the test and the console will contain the following lines:

  Smoke → maxMatches
FIXTURES                                  NODE_GLOB  FAST_GLOB
----------------------------------------  ---------  ---------
fixtures/file.md                          +          +
fixtures/first                            +          +
fixtures/first/file.md                    +          +
fixtures/first/nested                     +          +
fixtures/first/nested/directory           +          +
fixtures/first/nested/directory/file.md   +          +
fixtures/first/nested/file.md             +          +
fixtures/second                           +          +
fixtures/second/file.md                   +          +
fixtures/second/nested                    +          +
fixtures/second/nested/directory          +          +
fixtures/second/nested/directory/file.md  +          +
fixtures/second/nested/file.md            +          +

    √ pattern: 'fixtures/**/*' (sync) (17ms)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to run this one - it seemed to error on my machine

@joscha
Copy link
Author

joscha commented Nov 8, 2019

If you abstract from fast-glob, the @nodelib/fs.walk package is not the solution to your problem?

Just to clarify: if I use the streaming interface (https://github.com/mrmlnc/fast-glob/#stream) and I was only interested in the first match, using this code:

          const stream = fg.stream(glob);
          for await (const entry of stream) {
            return true; // first match ends the stream
          }

that would work, right?

@mrmlnc
Copy link
Owner

mrmlnc commented Nov 8, 2019

that would work, right?

Good catch! The stream will not be destroyed right now. And… this is a bug. Will be fixed in issue #239.

@joscha
Copy link
Author

joscha commented Nov 10, 2019

closing this in favour of #238 (comment)

@joscha joscha closed this Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: early exit for sync implementation
2 participants