Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching partial content #246

Open
shreyasminocha opened this issue Feb 21, 2021 · 5 comments
Open

Caching partial content #246

shreyasminocha opened this issue Feb 21, 2021 · 5 comments

Comments

@shreyasminocha
Copy link

    data = self.api.get(
        file_url,
        headers={'range': f'bytes={start}-{end}'},
        stream=True
    )
[cachecontrol.controller] Status code 206 not in (200, 203, 300, 301)

Would love to be able to cache partial content.

@yarikoptic
Copy link

I guess it is that time of the year: I was just referred to cachecontrol in my quest for a caching http proxy with range requests support. Got inspired by seeing how rclone does it with (seamingly, didn't look inside) sparse files to contain already fetched parts.
Our use case: FUSE file system on top of git/git-annex (datalad) repositories where we have information about http urls for the files content, but do not want to fetch entire files (could be TBs) to just access small portions of the file (e.g. metadata) datalad/datalad#4003 (comment)

@hexagonrecursion
Copy link
Contributor

Caching big files is not one of cachecontrol's strong suites at the moment. See #238. I'm working towards improving the situation (#240), but the progress is slow: I want to branch by abstraction, but my latest PR (#247) is stuck in the pipe.

At the moment the API that abstracts out the storage in cachecontrol (on master) looks like this

class Cache:
    def get(self, key: str) -> bytes:
        ...

    def set(self, key: str, value: bytes, expires=None) -> None:
        ...

    def delete(self, key: str) -> None:
        ...

    def close(self) -> None:
        ...

The key is derived from the URL and there is only one key per cached request. As you can see caching big files will require changing the API - you can't store the entire contents of a file in a single bytes instance - it'll take too much RAM. Caching partial content will require changes to this API too.

Would you like to join forces and discuss the possible solutions to both problems? Keep in mind that the current holdup is @ionrock rather than the shortage of my time.

@hexagonrecursion
Copy link
Contributor

If you have a practical problem that you want to solve ASAP I suggest dropping ionrock/cachecontrol in favor of a caching proxy that already implements partial content and large file support.

@shreyasminocha
Copy link
Author

I suggest dropping ionrock/cachecontrol in favor of a caching proxy that already implements partial content and large file support.

@hexagonrecursion any suggestions? There don't seem to be a lotta options.

@hexagonrecursion
Copy link
Contributor

@shreyasminocha Sorry. I have no clue either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants