Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux filenames with slashes #89

Open
dazinator opened this issue Jun 23, 2022 · 3 comments
Open

Linux filenames with slashes #89

dazinator opened this issue Jun 23, 2022 · 3 comments

Comments

@dazinator
Copy link
Owner

dazinator commented Jun 23, 2022

As mentioned on #88 by @aiurovet

I need to add test coverage and potentially solve an issue where, on Linux, if a file path contains a backslash, it would be tokenized as a Path Seperator and therefore matched as a path seperator but really its actually part of the filename. I don't think this is a problem for windows file names as slashes in any direction are illegal characters, and therefore always interpreted as path separators.

This may be tricky to resolve, as dotnet.glob does not inherently know where the string it's matching is coming from. It could be a URL path, a windows registry path, a file system path, I.e it's arbitrary.

However on Linux when passing a Linux file system name that contains a backwards Slash, it would catch you out to see the glob pattern fail to match.

E.g glob pattern foo*/bar would be expected to match the linux file path foo\gh/bar however it doesn't currently match because DotNet.Glob tokenises the backslash as a path seperator, and * only matches all characters within the current paths segment. gh are therefore in their own segment and not matched by the glob pattern anywhere.

However on the flip side, in all other scenarios like where the path is not a Linux file path, it might be perfectly valid to expect this behaviour and for the match to fail.

Therefore I think there needs to be some setting that influences the parser and tells it whether to interpret \ as a path separator, or as a literal.

I could consider defaulting this setting on Linux to treat it literally, however that would be a behavioural breaking change so would need a semver major bump and appropriate notice. I think it probably better to not enable this setting automatically and instead document that this setting should be set when passing file paths from Linux to the library.

@aiurovet
Copy link

Not quite. Even on Windows, foo\gh/bar (or foo\gh\bar, or foo/gh/bar) should not match the pattern foo*/bar, but only foo**/bar. Please refer to GNU documentation I mentioned already: section globstar at https://www.gnu.org/software/bash/manual/html_node/The-Shopt-Builtin.html

I believe, the major misunderstanding arises from what majority of sites wrongly suggest that the wildcard * translates into the regexp .* In fact, it is right for filenames only, and generally, should not cross the path separator boundary. So * should translate into [^\/\\]* of Windows, and into [^\/]* on POSIX-compliant OSes

@dazinator
Copy link
Owner Author

Even on Windows, foo\gh/bar (or foo\gh\bar, or foo/gh/bar) should not match the pattern foo*/bar, but only foo**/bar.

Yeah I am saying the same thing - just in a long fashion, so don't worry there is no misunderstanding here. The only thing I will say is that I am not planning on removing the constraint of ** being the only content of a segment so you wouldn't use foo**/bar with this library, but foo/**/bar or foo*/**/bar

@ericnewton76
Copy link

ericnewton76 commented Feb 18, 2024

I think potentially you add the concept of the globmatcher, which could be WindowsGlobMatcher LinuxPathGlobMatch etc which each one can give context to what the delimiters might be

So for example, FileSystemGlobber would load either WindowsGlob or LinuxGlobber and enumerate files in a directory, yield returning any that glob.IsMatch returns true

I actually have a use-case scenario of a listing of locations in a database, and a globber would help users select the ones they want to operate on. For example, all locations that start with a = [a]* or a first character, and z last character [a]*[z] realistically i'll just pass their spec string as to Sql Like function (so a% or a%z)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants