Fix S3 storage driver to support R2 Multipart upload #3940

tpoxa · 2023-06-12T18:58:16Z

Add extra call of flushPart to avoid losing pendingBytes when multipartcombinesmallpart is false

Signed-off-by: Maksym Trofimenko <maksym@container-registry.com>

Vad1mo · 2023-06-14T08:14:19Z

I verified the patch, and it works as expected:

Here is the example config for R2

storage:
  s3:
    region: auto
    bucket: a-bucket-name-as-set-in-R2
    regionendpoint: https://126dc445e8778171899b2bd6XXXXXXX.r2.cloudflarestorage.com
    multipartcombinesmallpart: false

milosgajdos · 2023-06-14T08:40:07Z

This introduces a recursion...which makes me think....I haven't properly dived into the code, but the fact this never cropped up even in 2.8 branch/release (

distribution/registry/storage/driver/s3-aws/s3.go

Lines 1340 to 1371 in 8728c52

    
           func (w *writer) flushPart() error { 
        
           	if len(w.readyPart) == 0 && len(w.pendingPart) == 0 { 
        
           		// nothing to write 
        
           		return nil 
        
           	} 
        
           	if len(w.pendingPart) < int(w.driver.ChunkSize) { 
        
           		// closing with a small pending part 
        
           		// combine ready and pending to avoid writing a small part 
        
           		w.readyPart = append(w.readyPart, w.pendingPart...) 
        
           		w.pendingPart = nil 
        
           	} 
        
           	partNumber := aws.Int64(int64(len(w.parts) + 1)) 
        
           	resp, err := w.driver.S3.UploadPart(&s3.UploadPartInput{ 
        
           		Bucket:     aws.String(w.driver.Bucket), 
        
           		Key:        aws.String(w.key), 
        
           		PartNumber: partNumber, 
        
           		UploadId:   aws.String(w.uploadID), 
        
           		Body:       bytes.NewReader(w.readyPart), 
        
           	}) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	w.parts = append(w.parts, &s3.Part{ 
        
           		ETag:       resp.ETag, 
        
           		PartNumber: partNumber, 
        
           		Size:       aws.Int64(int64(len(w.readyPart))), 
        
           	}) 
        
           	w.readyPart = w.pendingPart 
        
           	w.pendingPart = nil 
        
           	return nil 
        
           }

) makes this a bit peculiar...given this introduces recursion....could Writer be broken somehow? I think I'd prefer fixing that 🤔

tpoxa · 2023-06-14T20:59:11Z

It seems that the driver was never working fine with option multipartcombinesmallpart:false during multi-part upload. Pending bytes are never flushed at the end, so the entire push fails with an offset mismatch error.

It looks like a recursion, but flushPart doesn't run if ready and pending bytes are empty.

if len(w.readyPart) == 0 && len(w.pendingPart) == 0 {

It does the trick.
But yes, I agree. It looks a bit off.
I am keen to consider other approaches...

Thanks

milosgajdos · 2023-06-22T07:51:30Z

@tpoxa would it be possible for you to write a test that causes S3 storage driver test to fail? We can take it from there I reckon 🤔

milosgajdos · 2023-07-17T14:37:59Z

Circling back to this quickly. Is there any reason why you are settings this param to false?

multipartcombinesmallpart: false

Vad1mo · 2023-07-17T15:15:23Z

R2 expects same size parts. The exception is in the last part.

milosgajdos · 2023-10-02T19:57:45Z

We've recently merged a rework of S3 push path #4066

I wonder if that fixes the issue this PR attempts to address.

pditommaso · 2023-11-15T16:21:07Z

I'm just experiencing the multi-part upload issue with R2 using distribution v2.8.3, therefore this PR looks still valid. Any plan to get it merged?

…upport R2 Multipart upload PR goharbor#3940

milosgajdos · 2023-12-15T09:09:13Z

@pditommaso do you experience that issue when using distribution/distribution:edge image? 2.8.x image is way behind the main

pditommaso · 2023-12-15T10:28:50Z

We tested this branch, but it looks like still something weird is happening:

time="2023-11-17T14:47:12.495776284Z" level=error msg="response completed with error" err.code="blob unknown" err.detail="sha256:e4c58958181a5925816faa528ce959e487632f4cfd192f8132f71b32df2744b4" err.message="blob unknown to registry" go.version=go1.20.10 http.request.host="localhost:5000" http.request.id=3f5d56e8-b528-4739-b1b2-84a33f75fecf http.request.method=HEAD http.request.remoteaddr="172.17.0.1:58870" http.request.uri="/v2/library/ubuntu/blobs/sha256:e4c58958181a5925816faa528ce959e487632f4cfd192f8132f71b32df2744b4" http.request.useragent="docker/24.0.2 go/go1.20.4 git-commit/659604f kernel/5.4.0-1025-aws os/linux arch/amd64 UpstreamClient(Docker-Client/24.0.2 \\(linux\\))" http.response.contenttype=application/json http.response.duration=170.45707ms http.response.status=404 http.response.written=157 instance.id=e7579883-1e4a-4730-a1cd-488f9b08b1d4 service=registry vars.digest="sha256:e4c58958181a5925816faa528ce959e487632f4cfd192f8132f71b32df2744b4" vars.name=library/ubuntu version=v2.7.0-2216-g4b4747a8

and Docker continue to retry it

9a5e3a1e8fb7: Pushed 
b5ec8a11a02d: Retrying in 7 seconds 
c6c4c25cabf9: Pushed 
6dfab66a44d5: Pushed 
ab18cb8eb197: Retrying in 1 second

milosgajdos · 2023-12-15T11:16:27Z

Hmm, interesting. Something still seems broken when multipartcombinesmallpart is set to false.

I am still unconvinced by this patch, though, maybe @corhere has better ideas

corhere · 2023-12-15T23:15:47Z

I'm also unconvinced by the patch. If the problem is that pending bytes aren't flushed when committing/closing the writer, why not just flush twice when committing or closing the writer? Why double up all flushes?

Also, this branch needs a rebase.

(@milosgajdos I totally called it.)

milosgajdos · 2023-12-17T17:47:10Z

I'm also unconvinced by the patch. If the problem is that pending bytes aren't flushed when committing/closing the writer, why not just flush twice when committing or closing the writer? Why double up all flushes?

(@milosgajdos I totally called it.)

I did think about that comment at the time I was "cleaning up" that code, of course.

My understanding of the pending buffer has always been as follows:

We only ever flush parts when the client explicitly asks for it via Commit or Close.
Unless the client has explicitly asked to Commit / Close and while it keeps writing we are maintaining the "overflow" data (the bytes that haven't fitted into the ready buffer) in the pending buffer but only up to a chunkSize when we finally give up and do a forced flush. I suspect the reason is so we don't eventually run out of memory by creating year another buffer if client isn't committing: in other words, pending is distribution giving the client a bit of leeway? 🤷‍♂️

If we were flushing as soon as a writer buffer fills up, the client "loses control" over the flushes i.e. Commit essentially becomes almost obsolete, I mean other than completing the multipart upload? Note, that this is my understanding and I might be missing something here - I'm always happy to learn to understand this better.

Now flushing that pending buffer when *combinesmallpart is set to false seems "broken" because we only do it when *combinesmallpart is set to true (plus when some other conditions are met such as the pending data < chunkSize).

I think I might be missing something here as I always do - as I said I'd like to understand this better - but based on the OP reporter

R2 expects same size parts. The exception is in the last part.

I feel like the potential fix should be maybe done in the Commit / Close 🤔

corhere · 2023-12-17T22:28:23Z

We only ever flush parts when the client explicitly asks for it via Commit or Close.

We are flushing as soon as both buffers fill up.

distribution/registry/storage/driver/s3-aws/s3.go

Lines 1514 to 1519 in 79ef555

    
           // we filled up pending buffer, flush 
        
           if w.pending.Len() == w.pending.Cap() { 
        
           	if err := w.flush(); err != nil { 
        
           		return n, err 
        
           	} 
        
           }

If we were flushing as soon as a writer buffer fills up, the client "loses control" over the flushes i.e. Commit essentially becomes almost obsolete, I mean other than completing the multipart upload?

Yes. Buffer management is an implementation detail of the driver. It's why flush is not exported.

I just did some digital archaeology to discover the reason why there are two buffers, only to find what I was looking for...in the doc comment of type writer which still exists in HEAD! 🤦

distribution/registry/storage/driver/s3-aws/s3.go

Lines 1346 to 1349 in 79ef555

    
           // writer attempts to upload parts to S3 in a buffered fashion where the last 
        
           // part is at least as large as the chunksize, so the multipart upload could be 
        
           // cleanly resumed in the future. This is violated if Close is called after less 
        
           // than a full chunk is written.

Well, that is not entirely unreasonable. The minimum upload part size is 5 MiB on Amazon S3. If only 1 MiB is buffered when Close is called, the driver can't just discard the buffer. (The current bug notwithstanding.) And if it uploads the buffer as an upload part, it would not be resumable. The driver currently solves this problem by making sure it can never get into this situation: by only flushing half the buffer in Write it always has enough data in reserve to create a large enough part on Close.

Unfortunately this means multipartcombinesmallpart: false (added fairly recently, in #3556) cannot easily be fixed. The buffer contents need to go somewhere on Close. Uploading a part smaller than 5 MiB when flushing on Close does not immediately blow up, but it will prevent the write from being resumed. Therefore calling flush more times is not the correct fix. The buffered data would have to be stashed in a separate temporary object until the upload is resumed. As that is going to be a pretty large chunk of work to implement, I think the best course of action would be to remove the multipartcombinesmallpart option for v3.0.0 and document that we are incompatible with Cloudflare R2 at this time.

corhere

While this change does manage to make multipartcombinesmallpart: false less broken, resuming uploads is broken when that storage option is set.

milosgajdos · 2023-12-17T23:59:54Z

We are flushing as soon as both buffers fill up.

That's what I said in point 2

...and while it keeps writing we are maintaining the "overflow" data (the bytes that haven't fitted into the ready buffer) in the pending buffer but only up to the chunkSize when we finally give up and do a forced flush...

Anyways, thanks for the git archaeology. Come to think of it, I vaguely remember someone bringing this (resumable uploads and how distirbution handles them) up.

I agree, we need to remove that option and mention in the docs those types of uploads are not supported at the moment.

Add extra call of flushPart to avoid of losing pendingBytes

4b4747a

Signed-off-by: Maksym Trofimenko <maksym@container-registry.com>

milosgajdos requested a review from squizzi June 12, 2023 21:21

milosgajdos added the area/storage/s3 label Aug 20, 2023

Vad1mo added a commit to container-registry/harbor that referenced this pull request Nov 20, 2023

add patch distribution/distribution#3940 – Fix S3 storage driver to s…

35f30dd

…upport R2 Multipart upload PR goharbor#3940

corhere requested changes Dec 17, 2023

View reviewed changes

This was referenced Dec 18, 2023

fix: remove disabling of multipart combine small parts #4193

Merged

Wrong chunking of parts when uploading to S3 #3205

Open

corhere marked this pull request as draft December 18, 2023 15:50

peusebiu mentioned this pull request Dec 19, 2023

[Bug]: InvalidPart: All non-trailing parts must have the same length project-zot/zot#2142

Open

milosgajdos mentioned this pull request Dec 27, 2023

Push hangs at first layer #4220

Open

aomerk mentioned this pull request Jan 8, 2024

Implement R2 driver forta-network/disco#15

Merged

milosgajdos mentioned this pull request Apr 17, 2024

Update S3 flush to not have last part greater than defined chunkSize #4327

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix S3 storage driver to support R2 Multipart upload #3940

Fix S3 storage driver to support R2 Multipart upload #3940

tpoxa commented Jun 12, 2023 •

edited

Vad1mo commented Jun 14, 2023

milosgajdos commented Jun 14, 2023

tpoxa commented Jun 14, 2023

milosgajdos commented Jun 22, 2023

milosgajdos commented Jul 17, 2023 •

edited

Vad1mo commented Jul 17, 2023 •

edited

milosgajdos commented Oct 2, 2023 •

edited

pditommaso commented Nov 15, 2023

milosgajdos commented Dec 15, 2023

pditommaso commented Dec 15, 2023

milosgajdos commented Dec 15, 2023

corhere commented Dec 15, 2023

milosgajdos commented Dec 17, 2023

corhere commented Dec 17, 2023

corhere left a comment

milosgajdos commented Dec 17, 2023

Fix S3 storage driver to support R2 Multipart upload #3940

Are you sure you want to change the base?

Fix S3 storage driver to support R2 Multipart upload #3940

Conversation

tpoxa commented Jun 12, 2023 • edited

Vad1mo commented Jun 14, 2023

milosgajdos commented Jun 14, 2023

tpoxa commented Jun 14, 2023

milosgajdos commented Jun 22, 2023

milosgajdos commented Jul 17, 2023 • edited

Vad1mo commented Jul 17, 2023 • edited

milosgajdos commented Oct 2, 2023 • edited

pditommaso commented Nov 15, 2023

milosgajdos commented Dec 15, 2023

pditommaso commented Dec 15, 2023

milosgajdos commented Dec 15, 2023

corhere commented Dec 15, 2023

milosgajdos commented Dec 17, 2023

corhere commented Dec 17, 2023

corhere left a comment

Choose a reason for hiding this comment

milosgajdos commented Dec 17, 2023

tpoxa commented Jun 12, 2023 •

edited

milosgajdos commented Jul 17, 2023 •

edited

Vad1mo commented Jul 17, 2023 •

edited

milosgajdos commented Oct 2, 2023 •

edited