New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add blob.uploadfrom(inputstream) #162
Conversation
Codecov Report
@@ Coverage Diff @@
## master googleapis/java-core#162 +/- ##
============================================
+ Coverage 63.32% 63.60% +0.27%
- Complexity 529 543 +14
============================================
Files 30 30
Lines 4723 4770 +47
Branches 451 428 -23
============================================
+ Hits 2991 3034 +43
- Misses 1571 1576 +5
+ Partials 161 160 -1
Continue to review full report at Codecov.
|
This is the first part of the feature. I'm going to add similar functionality to the Storage interface as a separate pull request:
|
try (InputStream input = Files.newInputStream(path)) { | ||
return uploadFrom(input, options); | ||
} catch (IOException e) { | ||
throw new StorageException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOException should be thrown directly here. It should not be converted to a runtime exception (and StorageException should probably not be a runtime exception)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this PR.
I agree with all your comments, except ones related to IOException handling.
There are many places in the storage code where it is wraping to StorageException, that allows user to have a single handler for all kind of exceptions. I'm not sure that IOException deserves a dedicated handler.
@frankyn recommended extending Storage interface with upload functionality. I'm going to implement this in a separate PR.
Storage already has create(BlobInfo, InputStream)
method which doesn't throw IOException, so adding create(BlobInfo, Path)
which throws IOException in addition to StorageException will not look logically. I'm afraid people will write their own utility methods to wrap unnecessary exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageException is a huge mistake. It should never have been a runtime exception. Consistency is not a sufficient reason to propagate this broken design to new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing IOException for a few methods will not fix the problem with existing design but make it only worse.
I agree, BaseServiceException should be checked for better design. Does it mean that we cannot extend functionality unless https://github.com/googleapis/java-core/issues/57 is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't fix the existing design, but it won't make the problem worse. Adding new methods that also make this mistake will make the problem worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/BlobTest.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/BlobTest.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/BlobTest.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/it/ITStorageTest.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/test/java/com/google/cloud/storage/it/ITStorageTest.java
Outdated
Show resolved
Hide resolved
/** | ||
* Uploads the given content to the storage using specified write channel and the given | ||
* buffer size. The default buffer size is 15 MiB. Larger buffer sizes may improve the upload | ||
* performance but require more memory. It could cause OutOfMemoryError or add significant garbage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to give the user the option to set the buffer size? Do we provide sufficient tools and documentation to enable someone to do this?
It would be better to simply pick the best buffer size directly. If we're uncertain, lets leave this method out until we are, since it's easier to add it in later than take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we really need to give the user the option to play with the buffer size. There is no optimal value for the buffer size. By default, the buffer size is equal to the default Write Channel chunk size (15 MiB), but for uploading large files increasing the buffer will give significant performance benefit. Some statistics of uploading 1.5GB file with various buffer sizes:
15MiB: 162 seconds
30MiB: 149 seconds
75MiB: 141 seconds
100MiB: 140 seconds
150MiB: 139 seconds
300MiB: 138 seconds
On the other hand, one may need to upload hundreds file in parallel, this may require reducing the buffer size to avoid running out of memory. So, giving such flexibility will be certainly a plus.
Do we provide sufficient tools and documentation to enable someone to do this?
I'm afraid no. There was a request to provide a sort of such documentation: #40 which is closed as duplicate of the current one (add Blob.uploadFrom())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case this PR needs to include such documentation. I've seen from experience that tunable I/O buffer size parameters are very effective foot-seeking missiles. I'd estimate that somewhere above 80% of invocations of this method will decrease performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add notice, that this method is for those who want to experiment with the upload time and the amount of the consumed memory.
* collection overhead. Buffer sizes which are less than 256 KiB are not allowed, they will be | ||
* treated as 256 KiB. | ||
* | ||
* <p>This method does not close either InputStream or WriterChannel</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These details can be moved to the previous method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous one is intentionally made package private, it will be invoked from the StorageImpl class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the difference between upload and uploadFrom, so probably users will be too. Stocking with the public methods, I don't see any particular difference between the names upload
and uploadFrom
. I suggest making both upload
.
Also, why does one return a blob and one doesn't. Both should or neither.
Since I am confused, you might have good reasons for this distinction, but in that case let's make sure the docs and the method names make these reasons obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a Blob instance after upload has completed requires an extra GET request which is not always needed. So, upload
works faster than uploadFrom
, especially on small files.
I'm going to document this difference in the new method: Storage.upload(BlobInfo, Path)
uploadFrom
is named after downloadTo
(as requested in the issue).
upload
is static and its name is different. Would it be more clear to use writeContnet
instead?
/**
* Writes the given content to the storage using specified write channel and the given
* buffer size.
*/
public static void Blob.writeContent(InputStream content, WriterChannel writer, int bufferSize)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure.The distinction between the names uploadFrom and upload seems very hazy. Honestly, if I had to guess I'd expect uploadFrom was static rather than the other way around.
Do we actually need both of these? Can we delete the one that makes an extra GET request?
Consider this case:
- First request to upload a blob succeeds.
- GET request fails.
- Exception thrown
It won't be obvious to the user that the content was successfully upload to the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good example, quite rare but still valid.
To inform user of successful upload I re-throw an exception with the attendant message.
Do we actually need both of these? Can we delete the one that makes an extra GET request?
Yes, I believe we need both. If blob.uploadFrom()
returns void
, then blob
instance will become useless right after upload. As #149 demonstrates, it's not obvious how to get the updated blob object, so returning the updated updated would be helpful.
* collection overhead. Buffer sizes which are less than 256 KiB are not allowed, they will be | ||
* treated as 256 KiB. | ||
* | ||
* <p>This method does not close either InputStream or WriterChannel</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure.The distinction between the names uploadFrom and upload seems very hazy. Honestly, if I had to guess I'd expect uploadFrom was static rather than the other way around.
Do we actually need both of these? Can we delete the one that makes an extra GET request?
Consider this case:
- First request to upload a blob succeeds.
- GET request fails.
- Exception thrown
It won't be obvious to the user that the content was successfully upload to the server.
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
@@ -260,6 +263,89 @@ public void downloadTo(Path path) { | |||
downloadTo(path, new BlobSourceOption[0]); | |||
} | |||
|
|||
/** | |||
* Uploads the given file path to this blob using specified blob write options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably just don't understand how this API works, but this sounds off to me. I'd expect a static method that uploads the path to a new Blob and returns it. What does it mean to "upload to this blob"? I'm missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage hierarchy is the following:
Class BlobId: A BlobId object includes the name of the containing bucket, the blob's name and possibly the blob's generation. If getGeneration() is null the identifier refers to the latest blob's generation.
Class BlobInfo: encapsulate BlobId and set of properties.
Class Blob extends BlobInfo and adds a number of methods to modify the blob like update, copyTo, downloadTo, etc. Objects of this class are immutable. Operations that modify the blob return a new object.
New Blob.uploadFrom()
methods will allow those who already have a blob instance to upload some content to it as simply as blob = blob.uploadFrom(myFile);
I'd expect a static method that uploads the path to a new Blob and returns it.
A static method would require an extra parameter storage
, which is not very convenient.
As I said previously I'm going to extend the Storage interface with new methods:
void upload(BlobInfo blobInfo, Path path, BlobWriteOption... options);
void upload(BlobInfo blobInfo, InputStream content, BlobWriteOption... options);
public Blob create(BlobInfo blobInfo, Path path, BlobWriteOption... options);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what's throwing me: the object is immutable but has methods to modify it? Why?
This seems to be the existing API, but it's still extremely wonky. After you call uploadFrom (or downloadTo) how does the new Blob bject returned differ from the original Blob object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can consider a Blob instance like a pointer to {bucket, name, generation} and methods to operate with that pointer. Each modification of the blob causes the generation change, so existing pointer becomes outdated and the Blob instance useless(Immutability does not allow to modify the generation in place). A new object returned by uploadFrom has the new generation value pointing to the latest blob version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the docs said that, or the classes were named accurately. but c'est la vie.
My immediate question is what exactly has changed in the object? What field has been modified and how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new Blob objects are constructed from data received from GCS.
In case of uploadFrom only 'generation' and 'size' will be changed. (These fields are inherited from the BlobInfo class.)
uploadFrom uses WriteChannel, so it has to perform a separate GET request to obtain the fresh data.
upload method performs the RPC request and obtain the data to construct new object from the response.
} | ||
} | ||
|
||
static void uploadFrom(InputStream input, WriteChannel writer) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method will be invoked from the StorageImpl class
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
Reverts #162 Need to revert this for now. The issue is that the [Blob constructor is private](https://github.com/googleapis/java-storage/blob/1f53baa969331a94b5a73319df59711157ef2307/google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java#L545) and it requires creating the object as an [empty object before uploading with an InputStream](https://github.com/googleapis/java-storage/pull/162/files#diff-8c18a40b13cc5c5d90547c273470f5eeR2965-R2966). The implementation should be derived from the Storage class instead of Blob class to reduce the number of requests. cc: @dmitry-fa, could you recreate the PR? I haven't prioritized this, apologies for the delay.
Fixes googleapis/java-core#62