New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PowerShell extended property BaseName
for DirectoryInfo
is inconsistent when there is an extension
#21553
Comments
|
Where is the rule defined that a directory cannot have a file extension? A directory is still a file, and on POSIX systems the file system does not know about extensions at all, a file extension is purely down to an application's interpretation. On a Linux system you will find many directories ending with ".d" under /etc and I would consider that a file extension. So extension is really in the eye of the beholder.
|
Yes, and in those cases the basename would exclude the ".d"; you cannot have it both ways. |
Yes, name != basename + extension PS /etc> get-childitem rc* | Select-Object Name,BaseName,Extension
|
And where is that defined? (genuinely, it's not there) Posting more examples of what I'm describing as a bug isn't helping prove anything? Most languages either A dont provide a basename, allowing the application to be the beholder as it were, or B allow said application to provide the suffix explicitly. Powershell has taken it upon itself to interpret the extension, and if the programmer decides to use this value, the API needs to be self-consistent. |
Sorry, I was just confirming what I was seeing, that the basename still had the ,d on the end |
Ah, I thought that was an offered explanation; "Yes, [because] name isnt.." not "Yes, [I see in pwsh] name isnt.." But seriously, it's bizarre I can't find doco on |
This may make it hard to make any change other than document the behaviour. We don't have any idea how much existing code is relying on the existing behaviour. |
Dont let pwsh be another cmd.exe, we'll see what the devs say. No-one would be relying on this, they'd be compensating for it. |
$ gi ./foo.bar/ | ft Name, BaseName, Extension
Name BaseName Extension
---- -------- ---------
foo.bar foo.bar
$ gi ./foo.bar | ft Name, BaseName, Extension
Name BaseName Extension
---- -------- ---------
foo.bar foo.bar .bar
|
That's an interesting observation, and unfortunately not leverageable as a workaround (like for the example screenshot) :( |
The Microsoft build tools keep the trailing slash on directory names, so you don't need to append it when constructing full paths, eg
It has a couple advantages
However you do need to look for both '/' and '\' |
I'm unsure how that's relevant to an example that just wants to treat directories and files the same and have the output be predictable, nothing's trying to append paths or fetching items directly using known paths. For instance, trying to "clone" a directory by making symlinks whilst injecting into the new names:
Knowing |
PS /home/jborean> Get-Item $pwd | Get-Member -name BaseName
TypeName: System.IO.DirectoryInfo
Name MemberType Definition
---- ---------- ----------
BaseName ScriptProperty System.Object BaseName {get=$this.Name;}
PS /home/jborean> Get-Item $PSHome/pwsh | Get-Member -Name BaseName
TypeName: System.IO.FileInfo
Name MemberType Definition
---- ---------- ----------
BaseName ScriptProperty System.Object BaseName {get=if ($this.Extension.Length -gt 0){$this.Name.Remove($this.Name.Length - $this.Exte… You can also use PS /home/jborean> (Get-TypeData System.IO.FileInfo).Members.BaseName
GetScriptBlock SetScriptBlock IsHidde
n
-------------- -------------- -------
if ($this.Extension.Length -gt 0){$this.Name.Remove($this.Name.Length - $this.Extension.Length)}else{$this.Name} False
PS /home/jborean> (Get-TypeData System.IO.DirectoryInfo).Members.BaseName
GetScriptBlock SetScriptBlock IsHidden Name
-------------- -------------- -------- ----
$this.Name False BaseName |
Let me attempt a summary:
To add to @jborean93's comment re discovery of ETS members: |
Do you know why non-alphanumeric characters, such as spaces and brackets, are permitted to form the |
I suggest that the definition of file extension is really simple and is just what follows the period in a file name. Even that definition is ambiguous if there are multiple periods. Given there are no restrictions on what may be part of the stem, likewise there does not need to be any restriction on the extension. A common example in the Microsoft world is using tildes at the end of filenames that are temporary. |
I think that's absurd, though. Consider
|
The concept of valid extension does not exist. There are valid characters in a filename, and the concept of what ever follows the [last] period that that is it. Then also the historical definition of an 8.3 filename. Sure there are common extensions, and there are extension mappings listed in the registry. Applications can register what they want. Have a look at
The general idea behind extensions is it helps you know how to handle files, whether you can or not. If you don't recognize an extension that is absolutely fine, it means you don't know how to handle the file. |
In UNIX case is also important to certain applications. For instance C++ compilers treat lower-case "'.c" as a C file and ".C" as a C++ file. But that interpretation is down to the applications, there is no governing body allocating valid file extensions or how to interpret them. Historically Apple, ( of course Apple) had a TYPE/CREATOR registry. The original Macintosh had no concept of file extensions and the type of file was held in the directory entry for the file. Eg TEXT was a text file, PICT was an image, APPL was an application program etc. The equivalent of the Windows extensions mapping was why the Finder was called the Finder. It found the appropriate application for a file based on TYPE and CREATOR. You were supposed to apply to Apple for approval and to register your type and creator. |
Go to a Windows command prompt and type
then do the same in PowerShell In the original command prompt, . will list all files, whether they had a period in the name or not. Because that is how it worked on CP/M. |
Sorry, I am lost now. I don't know what you are wanting to achieve. If you are wanting to find the last period in a name then all you need is System.String.LastIndexOf rather than a regular expression, |
I don't think mentioning how extensions can be differentiated on letter-case or recognised at all is helpful in this context because those are already catered for in "my expected extension"™ (where we accept those characters and don't care how they're used).
I'm saying I dont think there's ever been a usecase for wanting spaces, periods, and brackets in the extension, and would like to know if there exists preceident for this. cmd equating
I mean to keep this in the realm of extensions, there are
The regular expression handles this fine, but I'm just using it as a way to communicate rather than listing conditions in english, which would be cumbersome, implementation isn't important. |
Think mechanism not policy. The definition of a file extension as everything after the [last] period has worked for around 50 years. If you want to do something more esoteric, then absolutely fine, but put that in different piece of code. Leave the existing mechanism that works as it is. |
The .NET implementation of the
Examples: ([System.IO.FileInfo[]] ('foo', 'foo.bar', 'foo. bar.docx', 'foo. bar', 'foo. ', 'foo.')).Extension |
% { "[$_]" } Output: []
[.bar]
[.docx]
[. bar]
[. ] # on Unix only: on Windows: []
[.] # on Unix only: on Windows: [] Note that the platform differences with respect to
# -> '[][.foo]'
[System.IO.FileInfo[]] '.foo'| % { '[{0}][{1}]' -f $_.BaseName, $_.Extension } Note that the problem of an empty base name doesn't arise in .NET, as As for While PowerShell's own wildcard patterns indeed only return items whose name contains at least one |
That actually answered a different question I had taboot; 'how can I pass a prospective path to the FS and get it validated/corrected without just trying it and catching an exception?'. I'll have a look at |
@Hashbrown777, note that a pitfall with casting (which simply translates into a constructor call behind the scenes) is that relative paths are then resolved against the process working directory, which usually differs from PowerShell's; that is, a fully robust cast would have to use |
Fortunately the rules of filenames are very simple. On Windows
And on UNIX
And to avoid the mentioned scenario of trailing spaces use System.String.Trim() Notice the path separator is not an invalid filename character. But only the file system can tell you if a particular volume/drive/directory is case sensitive or not. Seeing the code of a Chinese PowerShell project was an eye-opener, where not only the comments were in Chinese, but so were the file names and even function names, and it worked. |
For mere formal path validation, there's also
Finally, note that |
I note that on Linux
implements the Windows file system filtering convention, which is different from, say, My theory is that on Windows it is implemented by FindFirstFileW so the operating system does the filtering, but POSIX
this includes files without the period |
Good point, @rhubarb-geek-nz - I had wrongly assumed that a platform-native system call would be used on Unix-like platforms. Yes, PowerShell defers to .NET ( PowerShell/src/System.Management.Automation/namespaces/FileSystemProvider.cs Lines 92 to 97 in 5efd627
The .NET APIs themselves default to the Windows behavior, albeit inconsistently; see: Specifically, the |
It seems that the original issue is that |
That's not an issue: it is by - to me sensible - design, and I don't think it will change, nor - in my estimation - should it. As such, I think the The real issue is the - to me dubious - PowerShell behavior of selectively ignoring the name extension in directory names in the - PowerShell-only -
|
System.IO.DirectoryInfo
erroneously have Extension
populatedBaseName
for DirectoryInfo
is inconsistent when there is an extension
Thanks for calling that out. I would agree that it is inconsistent and a question of whether it's really a bucket 3 or not (and I do see you've done some initial research on this, thanks!). I've updated the title of this issue to reflect the core problem. Will tag for WG to discuss. |
WG discussed this. Although we agree that the design is not ideal, it was intentional when it was written and there is likely customers depending on this behavior. If we look at how unix systems define |
@SteveL-MSFT, while I can appreciated the concern about breaking things, note that the Just to clarify (which may help with documenting):
|
This issue has been marked as by-design and has not had any activity for 1 day. It has been closed for housekeeping purposes. |
📣 Hey @Hashbrown777, how did we do? We would love to hear your feedback with the link below! 🗣️ 🔗 https://aka.ms/PSRepoFeedback |
Prerequisites
Steps to reproduce
This occurs on both windows and linux, starting at v5 all the way through to now.
<#System.IO.FileInfo#> | %{ $_.BaseName + $_.Extension }
should always be equivalent to<#System.IO.FileInfo#>.Name
For
gci -File
this holds true.For
gci -Directory
although pwsh correctly has.BaseName
always match.Name
(folders cannot have extensions..),.Extension
incorrectly matches.Name -replace '^.*?(?=\.[^.]*$|$)',''
instead of always returning""
Expected behavior
Actual behavior
Error details
No response
Environment data
Visuals
The text was updated successfully, but these errors were encountered: