Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HfFileSystemStreamFile #1967

Merged
merged 10 commits into from
Feb 6, 2024
Merged

Add HfFileSystemStreamFile #1967

merged 10 commits into from
Feb 6, 2024

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jan 11, 2024

This allows faster file streaming. This is useful for streaming WebDatasets for example

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (b79882a) 82.55% compared to head (acca803) 82.20%.
Report is 18 commits behind head on main.

❗ Current head acca803 differs from pull request most recent head 1c47722. Consider uploading reports for the commit 1c47722 to get more accurate results

Files Patch % Lines
src/huggingface_hub/hf_file_system.py 24.39% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1967      +/-   ##
==========================================
- Coverage   82.55%   82.20%   -0.36%     
==========================================
  Files          66       66              
  Lines        8159     8187      +28     
==========================================
- Hits         6736     6730       -6     
- Misses       1423     1457      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR @lhoestq. Left some comments but mostly to be sure was is a implementation decision and what is required by fsspec. In general when there is a doubt I'd prefer to have an explicit comment in the code for the future ourselves (I feel fsspec is quite opinionated on some things).

Also, would it be possible to update the Usage part of this guide to showcase how to use HfFileSystemStreamFile and why.

Thanks in advance!

src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
Copy link
Contributor

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment from me :)

src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member Author

lhoestq commented Jan 26, 2024

added annotations + a retry when streaming + close connection :)

Copy link
Contributor

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One nit.

EDIT:

Maybe let's also add a test.

src/huggingface_hub/hf_file_system.py Show resolved Hide resolved
@lhoestq
Copy link
Member Author

lhoestq commented Jan 30, 2024

Adding a test was useful 🙃 I found two bugs and fixed them

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this PR @lhoestq! Looks good to me except for a failing test. Can you have a look at it please? (or remove the new line if not relevant). Thanks in advance!

tests/test_hf_file_system.py Outdated Show resolved Hide resolved
@Wauplin
Copy link
Contributor

Wauplin commented Feb 6, 2024

Thanks for fixing the test @lhoestq. Failing tests are now unrelated. Let's merge this :)

@Wauplin Wauplin merged commit 244e3ef into main Feb 6, 2024
13 of 16 checks passed
@Wauplin Wauplin deleted the add-HfFileSystemStreamFile branch February 6, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants