Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select which files to delete in huggingface-cli delete-cache #2219

Open
sealad886 opened this issue Apr 11, 2024 · 2 comments
Open

Select which files to delete in huggingface-cli delete-cache #2219

sealad886 opened this issue Apr 11, 2024 · 2 comments
Labels
CLI enhancement New feature or request

Comments

@sealad886
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The download tools all either dynamically or configurably download parts of a repo, since there are many times when the repo contains more files than needed (e.g. .pt, .safetensors). Once they're downloaded, you can't delete only part of the repo from your cache, and if you accidentally (or because you didn't know what you were doing) downloaded huge volumes of files, you can't delete just a part of it. The only option right now is to delete the whole repo and then re-download only what you want in your cache.

Describe the solution you'd like
Option to select files within a given cached repo, using the existing tools, that are to be deleted.

Describe alternatives you've considered
Manually manipulating files in the cache has gone..let's say "poorly" in the past. I've had to scrap my entire cache more than once doing that.

Additional context
n/a

@Wauplin Wauplin added enhancement New feature or request CLI labels Apr 11, 2024
@Wauplin Wauplin changed the title huggingface-cli delete-cache should have more granular control to delete only parts of cached repos instead of all-or-nothing Select which files to delete in huggingface-cli delete-cache Apr 11, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Apr 11, 2024

This is a valid feature request indeed. At the moment we can only delete specific revisions (usually main for the whole repo, or outdated revisions if more than one revision). To delete files we would need to update the UI / the API. Maybe it's fine to just have it for 1 repo at a time: huggingface-cli delete-cache <repo_id> --delete=... or huggingface-cli delete-cache <repo_id> --keep=... (where delete/keep are glob patterns).

Would you like to open a PR for it? :)

@sealad886
Copy link
Contributor Author

Yeah, for sure I can. This would be my first time doing anything more than very superficial dev on a project like this (on GH, that is). Please be gentle :)
I can have a think through the design of it first. Hopefully I'll have some downtime from other side projects to be working on tihis. Off the cuff thoughts:

  • delete-cache doesn't take arguments or flags in other contexts, so introducing that might be confusing.
  • delete-cache also is going down the road of using that TUI system for multiple selection, which would be confusing / difficult to do nested.
  • Perhaps a new subcommand like prune-cache or trim-cache or trim-repo... repo-cache. I like repo-cache because then you could extend that to more tools that would specific to managing the cache for a single repo.

Going with repo-cache, use case would be something like:

> huggingface-cli repo-cache delete --all repo_id     # same functionality as delete-cache but quick to delete just the one repo
> huggingface-cli repo-cache delete --include (glob) repo_id.  # use same glob format as download --include flag; mark files to __include__ for delete action
> huggingface-cli repo-cache delete --exclude [glob] repo_id. # same as --include, but opposite
> huggingface-cli repo-cache delete --file-list (list of files to remove) repo_id

I'll have a proper sit down at some point soon and see what's what.

Are there timelines I should be aware of that you know of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants