-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joining a cloudpath and string representing an absolute path removes the bucket name #344
Comments
Thanks @pjbull for correcting the title! You are right and that's exactly what the issue is:
|
Thanks @chopeen for this issue and the PR with a proposed fix. Apologies for the holdup, but I'm not 100% confident that this is the "right" fix, though I understand your use case. We need to answer the question of whether a bucket or a cloud-provider is the equivalent of a drive. As things work now, usually the root level from cloudpathlib import CloudPath
cp = CloudPath("s3://bucket1/dir/file.txt")
cp
#> S3Path('s3://bucket1/dir/file.txt')
# reference a file in another bucket with the same client
cp2 = (cp / "/bucket2/dir/file.txt")
cp2
#> S3Path('s3://bucket2/dir/file.txt') It is a useful convention since, for example, it let us do things like list all the available buckets on a provider, which was a requested feature. I do think with either option I present below we'll want to keep at least special casing from cloudpathlib import CloudPath
list(CloudPath("s3://").iterdir())
#> [ S3Path('s3://bucket1'),
#> S3Path('s3://bucket2'),
#> ... ] That said, we're not consistent everywhere. For example, from pathlib import PureWindowsPath
wp = PureWindowsPath("C:/bucket1/dir/file.txt")
wp.parts
#> ('C:\\', 'bucket1', 'dir', 'file.txt')
list(wp.parents)
#> [ PureWindowsPath('C:/bucket1/dir'),
#> PureWindowsPath('C:/bucket1'),
#> PureWindowsPath('C:/')]
from cloudpathlib import CloudPath
cp = CloudPath("s3://bucket1/dir/file.txt")
cp.parts
#> ('s3://', 'bucket1', 'dir', 'file.txt')
list(cp.parents)
#> [S3Path('s3://bucket1/dir'), S3Path('s3://bucket1')]
cp.anchor
#> 's3://'
wp.anchor
#> 'C:\\'
cp.drive
#> 'bucket1'
wp.drive
#> 'C:' This example may also be a little misleading since the Please add a 👍 if you think that the bucket should be the drive and absolute paths are relative to the bucket and a 👎 if you think the provider should be the drive and absolute paths are relative to the provider.
It is worth noting, that this will also affect a number of additional properties and methods like Finally, either way we go this change would need a deprecation cycle and warnings since folks may rely on the existing inconsistent behavior. |
Thank you for a detailed response! For me,
|
I understand what you're saying, but I expect the opposite will be more confusing to other developers. Paths that start with >>> from pathlib import Path
>>> Path('/dir1') / '/dir2/file.txt'
PosixPath('/dir2/file.txt') I think that because of this any relative path (whether local or cloud) when represented as a string should not start with a |
Wow, that's surprising! I have just checked and it is documented, but I was not aware of it anyway. Now I understand why you'd like to keep Should I change my PR to only log a warning in files#diff-7fa444066bc9ac67a45a3ccd86ec9a23f16b3ce4a0c48752578fe51db3d80346R695, to explicitly communicate something like |
Thanks @chopeen, I think that we'll close that PR for now since we generally avoid emitting noisy warnings, and for the most part if someone hits this an exception will bubble up immediately. I'm going to leave this issue open though to focus in getting consistent on what we want the "drive" of a |
There are two errors in the output:
An error occurred (403) when calling the HeadObject operation: Forbidden
An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
but neither of them explains the actual cause.
The text was updated successfully, but these errors were encountered: