Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: coerce all paths to forward slashes as a config option #468

Closed
nicolas377 opened this issue Apr 26, 2022 · 24 comments
Closed

feat: coerce all paths to forward slashes as a config option #468

nicolas377 opened this issue Apr 26, 2022 · 24 comments

Comments

@nicolas377
Copy link

nicolas377 commented Apr 26, 2022

With the recent issues in glob v8 (see the conversation in #467), I thought it would be a good idea to add a config option that would coerce the inputted glob path to use forward slashes instead of just forcing users to convert their paths to forward slashes.

Some use cases allow users to predictably use forward slashes (e.x. using relative paths in glob calls), but most don't allow that.

I'm not sure of a name for the config option, I'm open to suggestions. Once I get approval from the owners, I'll go ahead and open a PR.

@nicolas377
Copy link
Author

Just clarifying, the config option would be disabled by default.

@isaacs
Copy link
Owner

isaacs commented Apr 27, 2022

How about this: glob('c:\\some\\path\\*.js', {windowsPaths: true})
or: glob('c:\\some\\path\\*.js', {pathSeparator: '\\'})

With the caveat that if you set windowsPaths: true (or pathSeparator:'\\') then there is no way to escape glob characters, meaning it is impossible to ever match a file literally named * or ?.

@nicolas377
Copy link
Author

I like pathSeparator, especially since it can be easily set to path.sep.

Most of my use cases for glob involve resolving paths to an absolute path, and then appending to that string. I'm not sure if my use case is common, but it seems like it would be common, and I think that pathSeparator would help a lot with that use case.

@ath0mas
Copy link

ath0mas commented Apr 28, 2022

I like pathSeparator with path.sep too,
but it may feel "strange" and be misleading for users to have to set it explicitly while the most common or expected implicit default value?

then there is no way to escape glob characters, meaning it is impossible to ever match a file literally named * or ?

or "better" be an option to explicitly disable this possibility, disableGlobCharMatch or so, to allow to match "like before" without any replace or option about the "path separator" thing?

@isaacs
Copy link
Owner

isaacs commented Apr 28, 2022

There really is no way to have glob work as expected, if the escape character and the path separator are the same character. How can it know whether foo\* should match "any files within the foo folder" or "a file named foo*?

Either windows users cannot escape glob chars (which they've complained about), or they have to use / as path separators (which has been the recommendation for a decade now).

@isaacs
Copy link
Owner

isaacs commented Apr 28, 2022

So you can explicitly opt in to using \ as your path separator (and not have escaping), or maybe we make both the path separator and the escape character configurable? But "just use the implicit default path separator, and also be a bash style glob" isn't logically coherent.

@juergba
Copy link

juergba commented Apr 28, 2022

On Windows the path separator is always escaped. So glob('c:\\some\\path\\*.js') should still work in v8.0.1, because it's unambiguous. I haven't tested though.

In your examples above, it would be: foo\\* which is equivalent to foo/*.

Edit: couldn't you just replace every \\ by / whenever the platform is Windows (eg. set by a new option pathSeparator), before processing the pattern? Then you have both, the \\ path separator and the escaping of glob chars.

'c:\\some\\path\\*.js'.replace(/\\/g, '/') => c:/some/path/*.js
'foo\*'.replace(/\\/g, '/') => foo*
'foo\\*'.replace(/\\/g, '/') => foo/*

@nicolas377
Copy link
Author

The issue that I was describing was when dealing with paths run through either path.join or path.resolve. Most other cases still work, and don't need changes.

@nicolas377
Copy link
Author

couldn't you just replace every \ by / whenever the platform is Windows

That's actually pretty close to the implementation I was thinking of. My idea of the implementation of pathSeparator would go something like this:

  1. If pathSeparator isn't a string value, carry on with the regular glob function.
  2. When it is, split the given glob pattern across pathSeparator.
  3. Rejoin the split array with the path separator minimatch expects. (currently /)

Here's a quick implementation:

const { pathSeparator } = opts;

if (typeof pathSeparator === "string") {
  pattern = pattern.split(pathSeparator).join("/");
}

Here's a couple things I don't like with that implementation:

  • It creates a new variable pathSeparator
    • can be condensed easily, I'm not sure how this project weighs the extra variable vs readability
    • if readability is preferred, I can make the implementation a lot more readable
      • i'll post an example if this turns out to be the case
  • It mutates the function parameter pattern
    • Not exactly the best practice, but it does what it needs to without having to mess with variable names throughout the rest of the program.

So, after all of this, I just have two questions for @isaacs:

  1. In this specific project, do you value readability or a cleaner variable namespace?
  2. Does this project follow the principles of functional programming?
    • Not all of them, but specifically, treating parameters as immutable

Once I have answers, I'll go ahead and link a draft PR to this.

@isaacs
Copy link
Owner

isaacs commented May 1, 2022

#468 (comment)

@juergba You're confusing "escaped in the JavaScript code" with "escaped in the string". For example:

console.log('c:\\foo\\bar\\*\\baz')
// prints: c:\foo\bar\*\baz

So that *, is it escaped? It's ambiguous. There's no way around that as long as the escape character and the path separator are the same.

Say you want to find all the files in a directory that have a ? in them. Today, like in bash, you could use the pattern: /path/to/dir/*\?*. But if you do glob('/path/to/dir/*\\?*'), and we swap all \ chars with /, then on windows, that would be: /path/to/dir/*/?*. Or, in english: "files in a subdirectory of /path/to/dir whose basename has 1 character followed by zero or more other characters". Not the same thing, not even close!

Edit: couldn't you just replace every \ by / whenever the platform is Windows (eg. set by a new option pathSeparator), before processing the pattern? Then you have both, the \ path separator and the escaping of glob chars.

This was exactly the problem with Glob v7 and before. It meant that there is almost no way to match filenames containing ?, *, [, ], and so on, because the escape character was interpreted as a path separator. That's why there's been a caveat in the docs for 10 years telling you not to do that.

One possible approach to consider (semver minor change):

  • Add two new options, escapeCharacter (default: \) and pathSeparator (default: /).
  • If escapeCharacter and pathSeparator are equal, throw an error. (Windows users can use \ as their path separator, but only if they also choose a different escape character.)
  • Do not replace anything in the string itself, just use the escape character and path separator throughout the codebase instead of the \\ and / literals. If you set \ as your path separator, then it's on you to make sure you don't also have / in there. (This is weird, right? Because both \ and / "work" on windows?)
  • So maybe the pathSeparator needs to be multiple characters, none of which can be the escapeCharacter? I'm honestly sure what is the best blend of "usable" and "sensible" here.

Another possible approach: if a glob pattern matches /^[a-z]:\\/, throw an error telling the windows user not to use \ as a path separator, because that makes posix glob pattern matching impossible to do correctly, with a link to the documentation explaining why. (Semver major change)

But "just allow \ to be both a path separator and the escape character" is not the right answer.

@nicolas377 Regarding PRs and readability vs whatever, in this or any of my repos, here are some good rules of thumb:

  • Make the change as surgical as possible. Change the minimum necessary to do what you're trying to do. (Larger cleanup/refactoring work should be separated from functionality changes.)
  • Follow the patterns of the surrounding code as best you can understand them. Ideally, no one should be able to tell it wasn't there the whole time unless they look at the commit history.
  • Ensure that all code you add is fully covered by tests.
  • Ensure that the test you add fails if you take out the code you add. (Ie, the test should be relevant, even if no additional coverage was required.)

This module is marked for rewriting once I get time to do it (maybe months, maybe years, idk, but it's on my list) so definitely try to keep any change as tight as possible ;)

@nicolas377
Copy link
Author

Alright! I'll work on this tomorrow, hopefully I'll have the PR ready for review tomorrow as well.

@isaacs
Copy link
Owner

isaacs commented May 2, 2022

Another point about code style: I try to follow Most Restrictive Production, so yes, use consts where possible, small functions, etc.

@isaacs
Copy link
Owner

isaacs commented May 6, 2022

Maybe another take on this, less general, but YAGNI, so maybe better: just let Windows users specify a windowsPathsNoEscapes: true option. When this option is set, all \ characters are path separators, and not escape characters, so glob patterns like **/file\[asdf\].\?.js will be interpreted in that scenario as **/file/[asdf/]./?/.js and would match /path/to/file/[asdf/]./x/.js, but would not match a file literally named /path/to/file[asdf].?.js or /path/to/filea.x.js

Then the code change is just if (options.windowsPathsNoEscapes === true) { pattern = pattern.split('\\').join('/') }, and it behaves the same on windows as on posix.

@isaacs
Copy link
Owner

isaacs commented May 6, 2022

Updating minimatch and everything to make the \ escape character be configurable, and splitting on more than one path separator, not really a good approach. Way too much complexity and room for gotchas.

isaacs added a commit that referenced this issue May 12, 2022
@isaacs
Copy link
Owner

isaacs commented May 12, 2022

@nicolas377 I don't know if you've had a chance to take a look at this, but the suggested fix in my last comment is pretty trivial. PR up at #470 if you want to review and see if you think it will solve the problem.

isaacs added a commit that referenced this issue May 12, 2022
@nicolas377
Copy link
Author

Sorry that I've abandoned this! I'm in the middle of exam prep right now, but I'll try and review that PR over the weekend.

@isaacs
Copy link
Owner

isaacs commented May 13, 2022

Sorry that I've abandoned this! I'm in the middle of exam prep right now, but I'll try and review that PR over the weekend.

No worries! I always interpret such intentions as aspirations, you don't owe your life to OSS ;) Good luck on your exams.

@marqroldan
Copy link

marqroldan commented May 14, 2022

Hello, I would like to ask if this was meant to be included in the 7.2.2 release? Two versions were released 8 hours ago, 8.0.3 and 7.2.2, the changelog was only updated for 8.0

if it is, can the default value be false for options.allowWindowsEscape (in 7.2) please

@nicolas377
Copy link
Author

Nothing user-facing should have changed while upgrading from 7.2.1 to 7.2.2.

@nicolas377
Copy link
Author

nicolas377 commented May 14, 2022

I do have a few ideas, here they are.

Here are two ways I want users to be able to support windows (while removing support for escaping literal glob chars, which can be a feature to be discussed in the future).

Keep in mind that all of these examples are just api calls that I would like to work. They also assume const glob = require("glob") and a path run through path.win32.resolve() stored in path.

const { sep } = require("node:path/win32");

glob(path, { pathSep: sep }, () => {});
glob(path, { supportWindowsPaths: true }, () => {});

Let me explain these two options (I'll implement them this weekend):

pathSep

pathSep should be a string that the user intends to use as their path separator.

Internally, pathSep should just perform a replacement on the inputted path, switching the given pathSep with either a hardcoded "/" or path.posix.sep. Here's a pseudocode implementation, assuming the passed options are at the object variable options and the passed pattern is available at the string variable pattern:

If options.pathSep is set to a string that is not empty:
  Replace all instances of options.pathSep in pattern with a forward slash.

supportWindowsPaths

supportWindowsPaths is intended to just be a switch to flip so that glob does all the work on the backend.

Here's a pseudocode implementation, assuming the passed options are at the object variable options and the passed pattern is available at the string variable pattern:

If options.pathSep isn't set, and options.windowsPaths is truthy, and windows is the current platform:
  Split the pattern into an array by a regex of path.win32.sep joined with path.posix.sep, and rejoin it with path.posix.resolve()

For right now, I'd close that PR, and I'll try to open up a PR this weekend. Expect a draft PR up today or tomorrow.

@juergba
Copy link

juergba commented May 14, 2022

Nothing user-facing should have changed while upgrading from 7.2.1 to 7.2.2.

v7.2.1 has never been published on npm, and from 7.2.0 to 7.2.2 their may be breaking changes on windows.

@nicolas377
Copy link
Author

Stay on 7.2.0 for right now, I'm gonna open a tracking issue for that.

@nicolas377

This comment was marked as resolved.

avin added a commit to avin/typeorm that referenced this issue May 15, 2022
There is a bug in latest 7.2.* version of glob package
isaacs/node-glob#468 (comment)
isaacs added a commit that referenced this issue May 16, 2022
isaacs added a commit that referenced this issue May 16, 2022
isaacs added a commit that referenced this issue May 16, 2022
isaacs added a commit that referenced this issue May 16, 2022
@nicolas377
Copy link
Author

nicolas377 commented May 21, 2022

@isaacs whenever you're ready, #475 and #476 are my implementations of this. I could condense them into one PR if that would be easier on the commit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants
@isaacs @ath0mas @marqroldan @juergba @nicolas377 and others