feat: BlobWriteChannelV2 - same throughput less GC #2110

BenWhitehead · 2023-07-11T22:55:22Z

Use stable buffer allocation with laziness.

Leverage new JsonResumableSession to provide more robustness and easier separation of concerns compared to BlobWriteChannel

rename blobWriteChannel.ser.properties to the correct blobReadChannel.ser.properties

Runtime improvments

Throughput is on par with the existing v1 implementation, however GC impact has been lightened with the new implementation.

Below is the summary of the GC improvement between v1 and v2.

These GC numbers were collected while uploading 4096 randomly sized objects, from 128KiB..2GiB across 16 concurrent threads, using a default chunkSize of 16MiB.

metric	unit	v1	v2	% decrease
gc.alloc.rate	MB/sec	2240.056	1457.731	34.924
gc.alloc.rate.norm	B/op	955796726217	638403730507	33.207
gc.churn.G1_Eden_Space	MB/sec	1597.009	1454.304	8.936
gc.churn.G1_Eden_Space.norm	B/op	681418424320	636902965248	6.533
gc.churn.G1_Old_Gen	MB/sec	691.877	11.316	98.364
gc.churn.G1_Old_Gen.norm	B/op	295213237398	4955944331	98.321
gc.churn.G1_Survivor_Space	MB/sec	0.004	0.002	50.000
gc.churn.G1_Survivor_Space.norm	B/op	1572864	786432	50.000
gc.count	counts	1670	1319	21.018
gc.time	ms	15936	9527	40.217

Overall allocation rate is decreased, while Old_Gen use is almost entirely eliminated.

openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing)

-Xms12g -Xmx12g

All other java parameters are defaults.

frankyn · 2023-07-12T17:14:19Z

google-cloud-storage/src/main/java/com/google/cloud/storage/StorageImpl.java

-  public BlobWriteChannel writer(URL signedURL) {
+  public StorageWriteChannel writer(URL signedURL) {
+    // TODO: is it possible to know if a signed url is configured to have a constraint which makes
+    //   it idempotent?


The signed URL in this case creates a resumable upload session which are idempotent per session.

Creating a signed URL that can only be once would require https://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogifgenerationmatch with value 0 but it's possible that a session is created but not completed when this preconditions is checked which can't be helped.

If I'm understanding correctly, that if-generation-match header would be part of the signed URL and not something our client could be aware of. Meaning, our client doesn't have visibility into any precondition which might apply to the request.

oof, this is an interesting case; if a user generates a v4 signed url and adds this header as signed then our client tries to create a resumable upload; it will fail without this header in the request with the expected value.

Our client code could extract the header/values pairs from the signed URL, but sounds like a feature request.

...ud-storage/src/main/java/com/google/cloud/storage/HttpWritableByteChannelSessionBuilder.java

google-cloud-storage/src/main/java/com/google/cloud/storage/HttpUploadSessionBuilder.java

...ud-storage/src/main/java/com/google/cloud/storage/HttpWritableByteChannelSessionBuilder.java

frankyn · 2023-07-12T18:48:21Z

...loud-storage/src/main/java/com/google/cloud/storage/ApiaryUnbufferedWritableByteChannel.java

+    ByteRangeSpec rangeSpec = ByteRangeSpec.explicit(cumulativeByteCount, newFinalByteOffset);
+    if (available % ByteSizeConstants._256KiB == 0) {
+      header = HttpContentRange.of(rangeSpec);
+    } else {


Wouldn't this cause issues for customers that write data as it comes in (might be less than 256KiB) so buffer would be filled over multiple writes to perform a final flush on falling out of writer scope or explicitly flushing or close()?

Potentially in standalone yet, however customers are not interacting with this class directly themselves.

We have the following pipeline to usage of this class (bash pipeline shorthand):

BlobWriteChannelV2 | BaseStorageWriteChannel | DefaultBufferedWritableByteChannel | ApiaryUnbufferedWritableByteChannel ^-- buffer allocation and alignment happens here

...loud-storage/src/main/java/com/google/cloud/storage/ApiaryUnbufferedWritableByteChannel.java

frankyn · 2023-07-12T18:52:04Z

...-cloud-storage/src/test/java/com/google/cloud/storage/RewindableHttpContentPropertyTest.java

+                    byteBuffers(100, 1_000),
+                    byteBuffers(1_000, 10_000),
+                    byteBuffers(10_000, 100_000),
+                    byteBuffers(100_000, 1_000_000)))


Pinged you through chat; but what's the rationale for using multiple buffers versus one? A few reasons come to mind (easier to allocate smaller contiguous memory buffers, individual buffers can be passed as-is to transport writes); and wondering if there are others.

One big one for us specifically, is from the buffered to unbuffered layer, this enables the ability to reduce (and in some cases completely remove) copies between the ByteBuffer provided to our write method, and any underlying location it needs to be propagated to.

A codified test case that illustrates this minimized copying can be seen here:

java-storage/google-cloud-storage/src/test/java/com/google/cloud/storage/DefaultBufferedWritableByteChannelTest.java

Lines 316 to 321 in ed05232

* 0 61

* data: |------------------------------------------------------------|

* 16 (16) 32 (16) 48 (13) 61

* writes: |---------------|---------------|---------------|------------|

* 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60

* flushes: |--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--||

In that test flushes will happen every 3 bytes, where we're doing 16 byte writes. In order to flush, we don't need to copy into our buffer, we can simply take a 3 byte slice of the provided ByteBuffer and flush that. Then if there is any outstanding bytes they are enqueued. Then the next time write is called, in a single call we can flush our buffer and the slice from the new ByteBuffer.

Use stable buffer allocation with laziness. Leverage new JsonResumableSession to provide more robustness and easier separation of concerns compared to BlobWriteChannel * rename blobWriteChannel.ser.properties to the correct blobReadChannel.ser.properties ### Runtime improvments Throughput is on par with the existing v1 implementation, however GC impact has been lightened with the new implementation. Below is the summary of the GC improvement between v1 and v2. These GC numbers were collected while uploading 4096 randomly sized objects, from 128KiB..2GiB across 16 concurrent threads, using a default chunkSize of 16MiB. | metric | unit | v1 | v2 | % decrease | |---------------------------------|--------|-------------:|-------------:|-----------:| | gc.alloc.rate | MB/sec | 2240.056 | 1457.731 | 34.924 | | gc.alloc.rate.norm | B/op | 955796726217 | 638403730507 | 33.207 | | gc.churn.G1_Eden_Space | MB/sec | 1597.009 | 1454.304 | 8.936 | | gc.churn.G1_Eden_Space.norm | B/op | 681418424320 | 636902965248 | 6.533 | | gc.churn.G1_Old_Gen | MB/sec | 691.877 | 11.316 | 98.364 | | gc.churn.G1_Old_Gen.norm | B/op | 295213237398 | 4955944331 | 98.321 | | gc.churn.G1_Survivor_Space | MB/sec | 0.004 | 0.002 | 50.000 | | gc.churn.G1_Survivor_Space.norm | B/op | 1572864 | 786432 | 50.000 | | gc.count | counts | 1670 | 1319 | 21.018 | | gc.time | ms | 15936 | 9527 | 40.217 | Overall allocation rate is decreased, while Old_Gen use is almost entirely eliminated. ``` openjdk version "11.0.18" 2023-01-17 OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1) OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing) -Xms12g -Xmx12g ``` All other java parameters are defaults.

BenWhitehead added do not merge Indicates a pull request not ready for merge, due to either quality or timing. owlbot:ignore instruct owl-bot to ignore a PR labels Jul 11, 2023

BenWhitehead requested review from a team as code owners July 11, 2023 22:55

product-auto-label bot added size: xl Pull request size is extra large. api: storage Issues related to the googleapis/java-storage API. labels Jul 11, 2023

BenWhitehead force-pushed the blob-write-channel-v2 branch from 72f4a7f to 04362be Compare July 12, 2023 16:16

frankyn reviewed Jul 12, 2023

View reviewed changes

Base automatically changed from put-file to main July 13, 2023 20:06

BenWhitehead added 2 commits July 13, 2023 16:50

chore: review cleanup

5cc8745

BenWhitehead force-pushed the blob-write-channel-v2 branch from a1a01f8 to 5cc8745 Compare July 13, 2023 20:51

frankyn approved these changes Jul 13, 2023

View reviewed changes

BenWhitehead removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 13, 2023

chore: make content rewinding relative to it's offset in the object

9f00950

BenWhitehead merged commit 1b52a10 into main Jul 13, 2023
18 checks passed

BenWhitehead deleted the blob-write-channel-v2 branch July 13, 2023 23:20

release-please bot mentioned this pull request Jul 13, 2023

chore(main): release 2.25.0 #2117

Merged

lucymcnatt mentioned this pull request Sep 29, 2023

[IA-4581] update dependencies broadinstitute/workbench-libs#1538

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BlobWriteChannelV2 - same throughput less GC #2110

feat: BlobWriteChannelV2 - same throughput less GC #2110

BenWhitehead commented Jul 11, 2023 •

edited

frankyn Jul 12, 2023

BenWhitehead Jul 12, 2023

frankyn Jul 12, 2023

frankyn Jul 12, 2023

BenWhitehead Jul 12, 2023

frankyn Jul 12, 2023

BenWhitehead Jul 12, 2023 •

edited

	* 0 61
	* data: \|------------------------------------------------------------\|
	* 16 (16) 32 (16) 48 (13) 61
	* writes: \|---------------\|---------------\|---------------\|------------\|
	* 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
	* flushes: \|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|\|

feat: BlobWriteChannelV2 - same throughput less GC #2110

feat: BlobWriteChannelV2 - same throughput less GC #2110

Conversation

BenWhitehead commented Jul 11, 2023 • edited

Runtime improvments

frankyn Jul 12, 2023

Choose a reason for hiding this comment

BenWhitehead Jul 12, 2023

Choose a reason for hiding this comment

frankyn Jul 12, 2023

Choose a reason for hiding this comment

frankyn Jul 12, 2023

Choose a reason for hiding this comment

BenWhitehead Jul 12, 2023

Choose a reason for hiding this comment

frankyn Jul 12, 2023

Choose a reason for hiding this comment

BenWhitehead Jul 12, 2023 • edited

Choose a reason for hiding this comment

BenWhitehead commented Jul 11, 2023 •

edited

BenWhitehead Jul 12, 2023 •

edited