Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve http_get #1954

Merged
merged 3 commits into from
Jan 8, 2024
Merged

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jan 4, 2024

Fix #1942 (and in general should make this part more robust, it's not the first time we need to arrange this 😬).

When downloading a file, a progress bar is displayed. The title of the progress bar shows the filename of the file currently being downloaded. This displayed filename is either taken from the URL for non-LFS files or from the Content-disposition header if hosted on a CDN -based on a regex. The PR changes this by adding a new -optional- parameter to http_get to pass the filename explicitly. This will definitely fix the issue by avoiding to "guess" the filename (#1942 is about an encoding issue for files like 日本語/でこんにちは.jpg).

The parameter is made optional because cached_file (deprecated) cannot provide a filename. In general, the new parameter will be used in all hf_hub_download => i.e. all the time.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (66f8ea6) 82.40% compared to head (222ed47) 82.43%.

❗ Current head 222ed47 differs from pull request most recent head f30ff6c. Consider uploading reports for the commit f30ff6c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1954      +/-   ##
==========================================
+ Coverage   82.40%   82.43%   +0.02%     
==========================================
  Files          66       66              
  Lines        8140     8135       -5     
==========================================
- Hits         6708     6706       -2     
+ Misses       1432     1429       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

resume_size: float = 0,
headers: Optional[Dict[str, str]] = None,
expected_size: Optional[int] = None,
filename: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe display_filename to emphasize that it's not the filename where the file is going to be saved at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Removed in 6724e6c.

@Wauplin Wauplin merged commit fc2be44 into main Jan 8, 2024
16 checks passed
@Wauplin Wauplin deleted the 1942-fix-displayed-name-when-downloading-file branch January 8, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arbled characters when using snapshot_download and hf_hub_download to download files with cn/jp/kr names
3 participants