Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement download_artifact #5448

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

porink0424
Copy link
Contributor

@porink0424 porink0424 commented May 15, 2024

Motivation

Although upload_artifact existed, its counterpart did not exist. This PR introduces a new API download_artifact as the counterpart. In order to have symmetry with upload_artifact, download_artifact has an interface that specifies file_path as an argument.

Description of the changes

  • Added download_artifact
  • Added test for download_artifact

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great proposal. I've actually been waiting for this feature.

Let me share my early comment.

optuna/artifacts/_download.py Outdated Show resolved Hide resolved
tests/artifacts_tests/test_download_artifact.py Outdated Show resolved Hide resolved
@porink0424
Copy link
Contributor Author

porink0424 commented May 17, 2024

@toshihikoyanase
Thank you for your comments! I have fixed the points.

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your update!
I added two suggestions and a note.

optuna/artifacts/_download.py Outdated Show resolved Hide resolved
optuna/artifacts/_download.py Outdated Show resolved Hide resolved
Comment on lines +37 to +41
for i, artifact_id in enumerate(artifact_ids):
dummy_downloaded_file = str(tmp_path / f"dummy_downloaded_{i}.txt")
download_artifact(artifact_store, artifact_id, dummy_downloaded_file)
with open(dummy_downloaded_file, "r") as f:
assert f.read() == f"{study.trials[i].params['x']} {study.trials[i].params['y']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but can we use temporary files/directories to keep the environment clean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy files actually remain on tmp_path, so the machine where this test is run would not be kept completely clean.
Since I implemented this test referring to test_upload_artifact, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response.
Temporary files can sometimes complicate the execution of git commands. So, we might not want to leave them to minimize operational mistakes.

Since I implemented this test referring to test_upload_artifact, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.

Thank you for pointing it out. For the scope of this PR, let us solely concentrate on the tests of downloading, and we can work on the other files in follow-up PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this, pytest keeps 3 temporary directories by default, and we can prevent pytest from saving temporary directories by setting tmp_path_retention_policy.
So, adding just one line tmp_path_retention_policy = "none" to pyproject.toml would resolve this problem. Should I include this fix in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that there are two concerns, neither of which is a problem, so I think this PR is fine as is.
The first is that temporary files complicate git management, but since tmp_path is created in a system temporary directory, it does not affect git. (https://docs.pytest.org/en/7.3.x/how-to/tmp_path.html#the-default-base-temporary-directory)
The second is that the temporary files affect other test runs, but since tmp_path is unique for each function invocation, this is not a problem. (https://docs.pytest.org/en/7.3.x/reference/reference.html#tmp-path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood. These test cases employ the tmp_path fixture instead of creating temporary files in the test cases. Thanks for your explanation.

@nabenabe0928
Copy link
Collaborator

Could you add an explanation of download_artifact to tutorial/20_recipes/012_artifact_tutorial.py?

Copy link
Collaborator

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, I left some comments!
Could you please take another look?

optuna/artifacts/_download.py Outdated Show resolved Hide resolved
optuna/artifacts/_download.py Outdated Show resolved Hide resolved
optuna/artifacts/_download.py Outdated Show resolved Hide resolved
chunk = reader.read(BUFFER_SIZE)
if not chunk:
break
writer.write(chunk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is not important to guarantee the file safety here because the downloaded file will not be used by Optuna, I have a concern about the file lock here.
Could we, first of all, check whether the file_path already exists at open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabenabe0928
Thank you for your comments!
Do you mean that we should check if a file exists at file_path, and raise error if it does?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is kinder.
But warning or error should be discussed more closely.
At least, os.makedirs checks whether the target already exists, for example.
https://docs.python.org/ja/3/library/os.html#os.makedirs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the check whether a file exists.

Copy link

codecov bot commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.77%. Comparing base (181d65f) to head (6b27239).
Report is 155 commits behind head on master.

Current head 6b27239 differs from pull request most recent head 52542a9

Please upload reports for the commit 52542a9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5448      +/-   ##
==========================================
+ Coverage   89.52%   89.77%   +0.24%     
==========================================
  Files         194      196       +2     
  Lines       12626    12582      -44     
==========================================
- Hits        11303    11295       -8     
+ Misses       1323     1287      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

porink0424 and others added 4 commits May 29, 2024 16:29
@@ -139,6 +139,13 @@ def objective(trial: optuna.Trial) -> float:

print(content)

Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two suggestions.

  1. Could you remove the code that uses artifact_store.open_reader() since it is not a public api? It was just a workaround for download_artifact().
  2. Could you add a link on the download_artifact to the api doc like below?
Suggested change
Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module:
Also, you can easily download the artifact as a file using :func:`~optuna.artifacts.download_artifact` function, instead of using the artifact module:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review! I have removed open_reader() from the tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants