Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage.writer(...) breaks the idea of 'generation' #2476

Closed
kohsuke opened this issue Mar 28, 2024 · 2 comments
Closed

Storage.writer(...) breaks the idea of 'generation' #2476

kohsuke opened this issue Mar 28, 2024 · 2 comments
Labels
api: storage Issues related to the googleapis/java-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@kohsuke
Copy link

kohsuke commented Mar 28, 2024

Is your feature request related to a problem? Please describe.
I have an app that uploads a chunk of data to GCS, load that into BigQuery, and then delete this file from GCS. It does so many, many times over a period of time. Pseudo code below:

while(true) {
  try (var w=storage.writer(blobInfo)) {
    writeTo(w);
  }
  loadtoBQ();
  storage.delete(blobInfo.getBlobId());
}

I noticed that, some delete calls seem to sporadically fail with 503 service unavailable. The error message suggests those errors are transitory and I should retry that. In looking at the GCS storage library code, I noticed that there's a built-in retry mechanism to transparently handle this kind of situation, it treats delete as idempotent operation if the generation match requirement is given (see HttpRetryAlgorithmManager.getForObjectsDelete(). That makes sense!

Except, there's no way to actually capture the generation of my write. I can see that internally, the writer returned is BlobWriteChannel and its storageObject property represents the BlobInfo of a newly written blob, including the generation. That is how a method like Storage.create() can reliably return the Blob object that represents the state at the point of creation. But there seems to be no way to access the same information with the writer(...) methods. I consider this a library design problem.

Describe the solution you'd like
Storage.writer(...) should return a subtype of WriteChannel that can return Blob after its close method is invoked.

Describe alternatives you've considered
Call Storage.getBlob(BlobId) after the write is done to obtain a fresh Blob from GCS separately. This risks the race condition.

Additional context
#691 appears to be somewhat related, in the sense that it also wants additional information beyond WriteChannel

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/java-storage API. label Mar 28, 2024
@BenWhitehead
Copy link
Collaborator

In the next release (mid April, we're currently in a code freeze for Cloud Next) we will have a new experimental API that allows access to the resulting object for the upload.

This new API is called BlobWriteSession and can be dropped in with minimal change.

Your example would change to the following:

Storage storage = StorageOptions.http().build().getService();

BlobInfo info = BlobInfo.newBuilder("bucket", "object").build();

while (true) {
  // new experimental API
  BlobWriteSession session = storage.blobWriteSession(info); // create the upload session for the object
  ApiFuture<BlobInfo> resultInfo = session.getResult(); // a Future for the object created when the WritableByteChannel below is closed
  try (WritableByteChannel w = session.open()) { // open the channel for writing
    writeTo(w); // write to the channel the same as before
  }
  // get the object with generation from the future
  BlobInfo gen1 = resultInfo.get(5, TimeUnit.SECONDS);

  // issue the delete operation, now with a generation
  storage.delete(gen1.getBlobId());
}

We decided not to change WriteChannel or Storage#writer, because we also have other features that are configured on StorageOptions and influence the way BlobWriteSessions will work. When using the new BlobWriteSession (and it's default settings), it will perform the same retried resumable uploads that Storage#writer does.

While this is a new @BetaApi and theoretically could experience breaking changes, the Default settings are not at all likely to change (~98% confident the default settings won't change from what will be present in the next release). The primary possibility of breaking changes is for some of the other settings that can change the type of upload performed.

@BenWhitehead BenWhitehead added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 4, 2024
@BenWhitehead
Copy link
Collaborator

Version 2.37.0 was released last week with the necessary plumbing for this code sample to work. libraries-bom should be released this week if you use that for version resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants