-
-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement download_artifact
#5448
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great proposal. I've actually been waiting for this feature.
Let me share my early comment.
@toshihikoyanase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update!
I added two suggestions and a note.
for i, artifact_id in enumerate(artifact_ids): | ||
dummy_downloaded_file = str(tmp_path / f"dummy_downloaded_{i}.txt") | ||
download_artifact(artifact_store, artifact_id, dummy_downloaded_file) | ||
with open(dummy_downloaded_file, "r") as f: | ||
assert f.read() == f"{study.trials[i].params['x']} {study.trials[i].params['y']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but can we use temporary files/directories to keep the environment clean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy files actually remain on tmp_path
, so the machine where this test is run would not be kept completely clean.
Since I implemented this test referring to test_upload_artifact
, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response.
Temporary files can sometimes complicate the execution of git commands. So, we might not want to leave them to minimize operational mistakes.
Since I implemented this test referring to
test_upload_artifact
, which does not keep the environment clean as well, we may need to modify both tests if we do not want this situation.
Thank you for pointing it out. For the scope of this PR, let us solely concentrate on the tests of downloading, and we can work on the other files in follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this, pytest keeps 3 temporary directories by default, and we can prevent pytest from saving temporary directories by setting tmp_path_retention_policy
.
So, adding just one line tmp_path_retention_policy = "none"
to pyproject.toml
would resolve this problem. Should I include this fix in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that there are two concerns, neither of which is a problem, so I think this PR is fine as is.
The first is that temporary files complicate git management, but since tmp_path
is created in a system temporary directory, it does not affect git. (https://docs.pytest.org/en/7.3.x/how-to/tmp_path.html#the-default-base-temporary-directory)
The second is that the temporary files affect other test runs, but since tmp_path
is unique for each function invocation, this is not a problem. (https://docs.pytest.org/en/7.3.x/reference/reference.html#tmp-path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood. These test cases employ the tmp_path
fixture instead of creating temporary files in the test cases. Thanks for your explanation.
Could you add an explanation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, I left some comments!
Could you please take another look?
optuna/artifacts/_download.py
Outdated
chunk = reader.read(BUFFER_SIZE) | ||
if not chunk: | ||
break | ||
writer.write(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is not important to guarantee the file safety here because the downloaded file will not be used by Optuna, I have a concern about the file lock here.
Could we, first of all, check whether the file_path
already exists at open
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nabenabe0928
Thank you for your comments!
Do you mean that we should check if a file exists at file_path
, and raise error if it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is kinder.
But warning or error should be discussed more closely.
At least, os.makedirs
checks whether the target already exists, for example.
https://docs.python.org/ja/3/library/os.html#os.makedirs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the check whether a file exists.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5448 +/- ##
==========================================
+ Coverage 89.52% 89.77% +0.24%
==========================================
Files 194 196 +2
Lines 12626 12582 -44
==========================================
- Hits 11303 11295 -8
+ Misses 1323 1287 -36 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Shuhei Watanabe <47781922+nabenabe0928@users.noreply.github.com>
Co-authored-by: Naoto Mizuno <gobou522@gmail.com>
@@ -139,6 +139,13 @@ def objective(trial: optuna.Trial) -> float: | |||
|
|||
print(content) | |||
|
|||
Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two suggestions.
- Could you remove the code that uses
artifact_store.open_reader()
since it is not a public api? It was just a workaround fordownload_artifact()
. - Could you add a link on the
download_artifact
to the api doc like below?
Also, you can easily download the artifact as a file using `download_artifact` function, instead of using the artifact module: | |
Also, you can easily download the artifact as a file using :func:`~optuna.artifacts.download_artifact` function, instead of using the artifact module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review! I have removed open_reader()
from the tutorial.
Co-authored-by: c-bata <contact@c-bata.link>
Motivation
Although
upload_artifact
existed, its counterpart did not exist. This PR introduces a new APIdownload_artifact
as the counterpart. In order to have symmetry withupload_artifact
,download_artifact
has an interface that specifiesfile_path
as an argument.Description of the changes
download_artifact
download_artifact