Implement `download_artifact` #5448

porink0424 · 2024-05-15T09:35:44Z

Motivation

Although upload_artifact existed, its counterpart did not exist. This PR introduces a new API download_artifact as the counterpart. In order to have symmetry with upload_artifact, download_artifact has an interface that specifies file_path as an argument.

Description of the changes

Added download_artifact
Added test for download_artifact

toshihikoyanase

Thank you for your great proposal. I've actually been waiting for this feature.

Let me share my early comment.

optuna/artifacts/_download.py

tests/artifacts_tests/test_download_artifact.py

porink0424 · 2024-05-17T01:36:50Z

@toshihikoyanase
Thank you for your comments! I have fixed the points.

toshihikoyanase

Thanks for your update!
I added two suggestions and a note.

optuna/artifacts/_download.py

toshihikoyanase · 2024-05-17T02:29:01Z

tests/artifacts_tests/test_download_artifact.py

+    for i, artifact_id in enumerate(artifact_ids):
+        dummy_downloaded_file = str(tmp_path / f"dummy_downloaded_{i}.txt")
+        download_artifact(artifact_store, artifact_id, dummy_downloaded_file)
+        with open(dummy_downloaded_file, "r") as f:
+            assert f.read() == f"{study.trials[i].params['x']} {study.trials[i].params['y']}"


I'm not sure, but can we use temporary files/directories to keep the environment clean?

Dummy files actually remain on tmp_path, so the machine where this test is run would not be kept completely clean.
Since I implemented this test referring to test_upload_artifact, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.

Sorry for the delayed response.
Temporary files can sometimes complicate the execution of git commands. So, we might not want to leave them to minimize operational mistakes.

Since I implemented this test referring to test_upload_artifact, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.

Thank you for pointing it out. For the scope of this PR, let us solely concentrate on the tests of downloading, and we can work on the other files in follow-up PRs.

According to this, pytest keeps 3 temporary directories by default, and we can prevent pytest from saving temporary directories by setting tmp_path_retention_policy.
So, adding just one line tmp_path_retention_policy = "none" to pyproject.toml would resolve this problem. Should I include this fix in this PR?

My understanding is that there are two concerns, neither of which is a problem, so I think this PR is fine as is.
The first is that temporary files complicate git management, but since tmp_path is created in a system temporary directory, it does not affect git. (https://docs.pytest.org/en/7.3.x/how-to/tmp_path.html#the-default-base-temporary-directory)
The second is that the temporary files affect other test runs, but since tmp_path is unique for each function invocation, this is not a problem. (https://docs.pytest.org/en/7.3.x/reference/reference.html#tmp-path)

Ah, I misunderstood. These test cases employ the tmp_path fixture instead of creating temporary files in the test cases. Thanks for your explanation.

nabenabe0928 · 2024-05-21T03:25:37Z

Could you add an explanation of download_artifact to tutorial/20_recipes/012_artifact_tutorial.py?

nabenabe0928

Thank you for the PR, I left some comments!
Could you please take another look?

optuna/artifacts/_download.py

nabenabe0928 · 2024-05-21T03:42:30Z

optuna/artifacts/_download.py

+            chunk = reader.read(BUFFER_SIZE)
+            if not chunk:
+                break
+            writer.write(chunk)


Although it is not important to guarantee the file safety here because the downloaded file will not be used by Optuna, I have a concern about the file lock here.
Could we, first of all, check whether the file_path already exists at open?

@nabenabe0928
Thank you for your comments!
Do you mean that we should check if a file exists at file_path, and raise error if it does?

I think that is kinder.
But warning or error should be discussed more closely.
At least, os.makedirs checks whether the target already exists, for example.
https://docs.python.org/ja/3/library/os.html#os.makedirs

I have added the check whether a file exists.

optuna/artifacts/_download.py

codecov · 2024-05-24T05:51:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.77%. Comparing base (181d65f) to head (6b27239).
Report is 155 commits behind head on master.

❗ Current head 6b27239 differs from pull request most recent head 52542a9

Please upload reports for the commit 52542a9 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5448      +/-   ##
==========================================
+ Coverage   89.52%   89.77%   +0.24%     
==========================================
  Files         194      196       +2     
  Lines       12626    12582      -44     
==========================================
- Hits        11303    11295       -8     
+ Misses       1323     1287      -36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/artifacts_tests/test_download_artifact.py

optuna/artifacts/_download.py

Co-authored-by: Shuhei Watanabe <47781922+nabenabe0928@users.noreply.github.com>

Co-authored-by: Naoto Mizuno <gobou522@gmail.com>

c-bata · 2024-05-29T08:02:02Z

tutorial/20_recipes/012_artifact_tutorial.py

@@ -139,6 +139,13 @@ def objective(trial: optuna.Trial) -> float:

    print(content)

+Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module:


I have two suggestions.

Could you remove the code that uses artifact_store.open_reader() since it is not a public api? It was just a workaround for download_artifact().

Could you add a link on the download_artifact to the api doc like below?

Suggested change

Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module:

Also, you can easily download the artifact as a file using :func:`~optuna.artifacts.download_artifact` function, instead of using the artifact module:

Thank you for your review! I have removed open_reader() from the tutorial.

Co-authored-by: c-bata <contact@c-bata.link>

porink0424 added 3 commits May 15, 2024 18:31

Implement download_artifact

acf5cf8

Add test_download_artifact

2cfb4a4

Apply formatter

547d638

toshihikoyanase requested changes May 15, 2024

View reviewed changes

optuna/artifacts/_download.py Outdated Show resolved Hide resolved

tests/artifacts_tests/test_download_artifact.py Outdated Show resolved Hide resolved

porink0424 added 2 commits May 17, 2024 10:13

Make download_artifact public

e820369

Import public functions instead of private ones in test

765d34e

toshihikoyanase reviewed May 17, 2024

View reviewed changes

porink0424 added 2 commits May 17, 2024 12:42

Add download_artifact to the Artifact reference

b238652

Impl streaming to keep memory usage small in download_artifact

e0188e6

eukaryo assigned not522 and nabenabe0928 May 20, 2024

nabenabe0928 requested changes May 21, 2024

View reviewed changes

nabenabe0928 reviewed May 21, 2024

View reviewed changes

optuna/artifacts/_download.py Outdated Show resolved Hide resolved

porink0424 added 3 commits May 24, 2024 12:45

Add explanation about download_artifact in 012_artifact_tutorial

0109dd4

Remove unnecessary line breaks and modify comments

732fdb6

Apply blackdoc

6b27239

not522 reviewed May 29, 2024

View reviewed changes

tests/artifacts_tests/test_download_artifact.py Outdated Show resolved Hide resolved

c-bata reviewed May 29, 2024

View reviewed changes

optuna/artifacts/_download.py Outdated Show resolved Hide resolved

porink0424 and others added 4 commits May 29, 2024 16:29

Update optuna/artifacts/_download.py

e76a105

Co-authored-by: Shuhei Watanabe <47781922+nabenabe0928@users.noreply.github.com>

Update tests/artifacts_tests/test_download_artifact.py

55c7582

Co-authored-by: Naoto Mizuno <gobou522@gmail.com>

Use shutil.copyfileobj instead of looping manually

e4d9349

Add file existence check before downloading

36f0622

c-bata reviewed May 29, 2024

View reviewed changes

porink0424 and others added 2 commits May 29, 2024 17:14

Update tutorial/20_recipes/012_artifact_tutorial.py

4618a3a

Co-authored-by: c-bata <contact@c-bata.link>

Remove open_reader() and use download_artifact()

52542a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `download_artifact` #5448

Implement `download_artifact` #5448

porink0424 commented May 15, 2024 •

edited

toshihikoyanase left a comment

porink0424 commented May 17, 2024 •

edited

toshihikoyanase left a comment

toshihikoyanase May 17, 2024

porink0424 May 17, 2024

toshihikoyanase May 24, 2024

porink0424 May 24, 2024

not522 May 29, 2024

toshihikoyanase May 31, 2024

nabenabe0928 commented May 21, 2024

nabenabe0928 left a comment

nabenabe0928 May 21, 2024

porink0424 May 24, 2024

nabenabe0928 May 24, 2024

porink0424 May 29, 2024

codecov bot commented May 24, 2024 •

edited

c-bata May 29, 2024

porink0424 May 29, 2024

		@@ -139,6 +139,13 @@ def objective(trial: optuna.Trial) -> float:

		print(content)

		Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module:

Implement download_artifact #5448

Are you sure you want to change the base?

Implement download_artifact #5448

Conversation

porink0424 commented May 15, 2024 • edited

Motivation

Description of the changes

toshihikoyanase left a comment

Choose a reason for hiding this comment

porink0424 commented May 17, 2024 • edited

toshihikoyanase left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nabenabe0928 commented May 21, 2024

nabenabe0928 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 24, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Implement `download_artifact` #5448

Implement `download_artifact` #5448

porink0424 commented May 15, 2024 •

edited

porink0424 commented May 17, 2024 •

edited

codecov bot commented May 24, 2024 •

edited