Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipp.Path is not Path-like enough for open #110

Open
xtofl opened this issue Aug 9, 2023 · 4 comments
Open

zipp.Path is not Path-like enough for open #110

xtofl opened this issue Aug 9, 2023 · 4 comments

Comments

@xtofl
Copy link

xtofl commented Aug 9, 2023

I'm not sure if this is a problem with open not recognizing zipp.Path or with zipp.Path not implementing the right interface.

But we can observe a mysterious incompatibility between zipp.Path and standard library functions.

Disclaimer: I realize a more thorough type-analytical paper could probably be written about the relations between zipp.Path and pathlib.Path, describing Liskov's Substitution Principle and including some in-depth type-theory and category theoretical proofs, but I'm not there yet :).

Observed behavior:

>>> zipfile.is_zipfile("some-zip.zip")
True
>>> zipfile.is_zipfile(zipp.Path("some-zip.zip"))
Traceback (most recent call last):
  Cell In[10], line 1
    zipfile.is_zipfile(zipp.Path("some-zip.zip"))
  File /usr/lib/python3.8/zipfile.py:208 in is_zipfile
    with open(filename, "rb") as fp:
TypeError: expected str, bytes or os.PathLike object, not Path

>>> pathlib.Path("some-zip.zip")
PosixPath('some-zip.zip')
>>> pathlib.Path(zipp.Path("some-zip.zip"))
Traceback (most recent call last):
  Cell In[13], line 1
    pathlib.Path(zipp.Path("some-zip.zip"))
  File /usr/lib/python3.8/pathlib.py:1042 in __new__
    self = cls._from_parts(args, init=False)
  File /usr/lib/python3.8/pathlib.py:683 in _from_parts
    drv, root, parts = self._parse_args(args)
  File /usr/lib/python3.8/pathlib.py:667 in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not Path

Expected behavior:

>>> zipfile.is_zipfile(zipp.Path("some-zip.zip"))
True
>>> pathlib.Path(zipp.Path("some-zip.zip"))
PosixPath('some-zip.zip')
@jaraco
Copy link
Owner

jaraco commented Sep 18, 2023

Thanks for the report, especially laying out your observed and expected behavior!

It's true that zipp.Path cannot be properly path-like. Consider zipp.Path("some-zip.zip", "foo/bar/"). Such a zipp.Path cannot be path-like because it has no direct representation it the file system. Consider also a pure in-memory zipfile zipp.Path(in_memory_zip), which also wouldn't have a file system representation. Only in the narrow case of the root of the zipp.Path when the underlying zipfile is filesystem backed could it be considered equivalent to the zipfile itself.

It's conceivable that zipp.Path could implement __fspath__ in such cases where the root zip file is a filesystem-backed file and where the Path object is at the root and error otherwise. Such an approach would plausibly allow the expected behaviors.

In light of these other use-cases, does that change your expectations at all?

What is the use-case that led you to have this expectation (why might someone else have this same expectation)?

@xtofl
Copy link
Author

xtofl commented Oct 2, 2023

I have code that expects a Path, and will accept zip files and regular folders alike (think .docx files or their unzipped folders). We're handling some kind of proprietary recordings that are represented in either way.

In case the code receives a path that points to a zip file (and not a folder), it will continue with

def transparent_unzip(recording: Path) -> Path:
  if zipfile.is_zipfile(recording):
    return zipp.Path(recording)
  else:
    return recording

def handle(folder_or_zip: Path):
  recording = transparent_unzip(folder_or_zip)
  schema = (recording / "marker").read_text()
  ...

Out-of Scope Expectation

cannot be path-like because it has no direct representation it the file system

This statement would imply that e.g. files in /tmp cannot be accessed like other files (/tmp normally has an in-memory file system mounted onto it). Same goes for /proc etc... So I think we have to disregard that: the hierarchy can, at least conceptually, be navigated deeply into the zip file - even into a nested zip file if need be.

It is navigating out of the zip (/path/to/a.zip/../a.zip is a zip file, too) file that could pose big problems, and I'm definitely not going that far. Nor the /path/to/outer.zip/with/inner.zip (which is a little easier).

Just Good Enough Scope Expectation

So yes, zipp.Path object would be 'path-like' when pointing to the root would solve my use case.

@xtofl
Copy link
Author

xtofl commented Oct 17, 2023

Today I bumped onto another occurrence of the same behavior: extracting an image from a zip file:

import zipp
root = zipp.Path("/path/to.zip")

import PIL.Image
with Image.open(root / "image.png") as img:
  ...

This Image.open assumes a path-like object, but zipp.Path is not. As a consequence, the library treats the object like a file object.

@jaraco
Copy link
Owner

jaraco commented Oct 17, 2023

Today I bumped onto another occurrence of the same behavior

This use-case wouldn't be solved by making the root object Path-like.

What could zipp.Path("/path/to.zip", "image.png") possibly represent to Image.open() to allow it to function? There's no path on the file system that contains "image.png". In this case, PIL should add support for Traversable objects, at which point Image.open() could load the contents from the item in the zipfile.

How does this new expectation affect your thinking about the original concern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants