Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@uppy/aws-s3-multipart: increase priority of abort and complete #4542

Merged
merged 4 commits into from
Jul 6, 2023

Conversation

schonert
Copy link
Contributor

@schonert schonert commented Jul 4, 2023

Closes #4541
Closes #4326

Wraps the final upload requests, abortMultipartUpload and completeMultipartUpload, in a separate queue. Preventing the race condition where compiled uploads would stale if new uploads filled up the queue.

Considered simply rewriting the areas that used the methods to not follow the queue pattern (onAbort etc). However, this seemed like a less intrusive solution allowing for a more permanent solution to evolve down the road.

Also, you should consider adding some sort of format config (eg. prettier) 😄

@Murderlon Murderlon requested a review from aduh95 July 4, 2023 07:25
@Murderlon Murderlon changed the title AWS-S3-Multipart: Add abort and complete to different queue @uppy/aws-s3-multipart: add abort and complete to different queue Jul 4, 2023
Copy link
Member

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the correct fix, we would be doubling the number of requests made in parallel, making limit very hard to document and confusing for users. I think we should instead change the priority of those requests, on the same queue.

@schonert
Copy link
Contributor Author

schonert commented Jul 5, 2023

I think we should instead change the priority of those requests, on the same queue.

@aduh95 I don't think this will solve the race-conditions?
A completion request will still have to wait for a spot in the queue - which can be running on all cylinders. So even if it's the highest priority it will have to wait for an opening to appear.

@Murderlon
Copy link
Member

So even if it's the highest priority it will have to wait for an opening to appear.

but that doesn't matter, right? Nothing goes wrong if we don't call finish immediately after the last patch request. We simply want to make sure it isn't postponed for too long. Furthermore, the idea of a priority queue is that if you manage it well it should behave in the order you want it to happen. It simply has to be the highest priority at the right time.

@aduh95
Copy link
Member

aduh95 commented Jul 5, 2023

A completion request will still have to wait for a spot in the queue - which can be running on all cylinders. So even if it's the highest priority it will have to wait for an opening to appear.

It's also true with this PR: unless the limit is Infinity, the request will need to wait in the queue until there's a spot available – albeit it's far more likely that there will be spot available most of the time since it's a separate queue, but if you want the request to fire immediately, we should not use a queue at all.
Like @Murderlon said, it's probably OK if the request has to wait for a bit, it's a necessary tradeoff IMO.

@schonert
Copy link
Contributor Author

schonert commented Jul 6, 2023

but that doesn't matter, right?

Well, it matters if you're aiming for a snappy upload feeling while uploading tons of large files. In our case we need to generate thumbnails and smaller versions of large videos. The faster we can get the GO signal from the uploader the better.

Having two files idling at 100% seems unnecessary in my opinion.
I understand the priority of simplicity and documentation. I'll finish up the PR and move on to patching this up for our use case.

I think we should instead change the priority of those requests, on the same queue.

What should the priority be? Infinity?

@schonert
Copy link
Contributor Author

schonert commented Jul 6, 2023

@aduh95 Reading through the code I noticed that if there are multiple handlers with Infinity the last one added would be the most important.
Since we won't find anything greater than infinite, the index will always be -1.

const index = this.#queuedHandlers.findIndex((other) => {
return handler.priority > other.priority
})
if (index === -1) {
this.#queuedHandlers.push(handler)
} else {
this.#queuedHandlers.splice(index, 0, handler)
}

We could set abortMultipartUpload and createMultipartUpload to priority: Infinity and change uploadPartBytes to something like priority: 99999999. This should ensure completion requests are served first?

Also, should we consider a first-in-first-served approach?
Changing line 107 to return handler.priority >= other.priority should do the trick.

@schonert
Copy link
Contributor Author

schonert commented Jul 6, 2023

@aduh95 okay now I doubt my ability to comprehend the queue logic 😅 Disregard my last comment. It's already first-in-first-servered. So setting them to Infinity would do the trick.

@Murderlon Murderlon changed the title @uppy/aws-s3-multipart: add abort and complete to different queue @uppy/aws-s3-multipart: increase priority of abort and complete Jul 6, 2023
@Murderlon
Copy link
Member

Thanks for the changes. @schonert how have you tested this to confirm it works for you?

@aduh95
Copy link
Member

aduh95 commented Jul 6, 2023

We could set abortMultipartUpload and createMultipartUpload to priority: Infinity and change uploadPartBytes to something like priority: 99999999. This should ensure completion requests are served first?

uploadPartBytes priority is set to Infinity, because once the signature has been generated, we want the upload to start right away to avoid the risk of the signature expiring – which is arguably worst than having an upload stuck at 100% for a few minutes. However, because fetchSignature priority is the default 0, and uploadPartBytes depends on it, any request added to the queue a request with priority >0 would still execute first.

What should the priority be? Infinity?

So because they do not expire, contrary to uploadPartBytes, using priority Infinity for completion and abort requests is not the best, maybe priority 1 would suffice – it would still behave the same as Infinity (they need to wait for on going requests to complete, then start right away (until we introduce requests with priority 2 or something, but I don't see why we would do that)), and it would decrease the risk of clients on slow connections getting Request has expired errors.

@schonert
Copy link
Contributor Author

schonert commented Jul 6, 2023

Thanks for the changes. @schonert how have you tested this to confirm it works for you?

@Murderlon this is identical to my previous monkey patched solution. Works as expected. Afraid it will not solve the race-conditions I initially set out to solve.

@aduh95 makes sense. Updated the pull request 👍

@aduh95
Copy link
Member

aduh95 commented Jul 6, 2023

Afraid it will not solve the race-conditions I initially set out to solve.

I wouldn't call that a race condition, but I get where you coming from. I'm afraid we would need to come up with a completely new queuing model if we wanted to address that. In most cases, the only requests that need to be limited are the uploadPartBytes and therefor fetchSignature; all the other requests are likely HTTP/2 anyway, so limiting them does not do much good, and could skip the queue altogether; I'm afraid we couldn't have it as the default behavior though, as it could be quite breaking for folks who are using a backend using HTTP/1.1 (limited to 6 concurrent connexion to the same domain).

Note that in most cases, chunks are 5MiB large, so you can expect it will take at most the time for your client to upload 5MiB before the complete/abort request is actually sent.

@Murderlon Murderlon merged commit 8604178 into transloadit:main Jul 6, 2023
15 checks passed
@Murderlon
Copy link
Member

Thanks!

@github-actions github-actions bot mentioned this pull request Jul 6, 2023
github-actions bot added a commit that referenced this pull request Jul 6, 2023
| Package                | Version | Package                | Version |
| ---------------------- | ------- | ---------------------- | ------- |
| @uppy/aws-s3           |   3.2.1 | @uppy/golden-retriever |   3.1.0 |
| @uppy/aws-s3-multipart |   3.4.1 | @uppy/status-bar       |   3.2.1 |
| @uppy/companion        |   4.6.0 | @uppy/tus              |   3.1.2 |
| @uppy/companion-client |   3.2.0 | @uppy/xhr-upload       |   3.3.1 |
| @uppy/core             |   3.3.0 | uppy                   |  3.11.0 |

- @uppy/companion: fix infinite recursion in uploader test (Mikael Finstad / #4536)
- @uppy/xhr-upload: export `Headers` type (Masum ULU / #4549)
- @uppy/aws-s3-multipart: increase priority of abort and complete (Stefan Schonert / #4542)
- @uppy/aws-s3: fix remote uploads (Antoine du Hamel / #4546)
- meta: use `corepack yarn` instead of `npm` to launch E2E (Antoine du Hamel / #4545)
- @uppy/aws-s3-multipart: fix upload retry using an outdated ID (Antoine du Hamel / #4544)
- @uppy/status-bar: remove throttled component (Artur Paikin / #4396)
- @uppy/aws-s3-multipart: fix Golden Retriever integration (Antoine du Hamel / #4526)
- examples/aws-nodejs: merge multipart and non-multipart examples (Antoine du Hamel / #4521)
- @uppy/companion: bump semver from 7.3.7 to 7.5.3 (dependabot[bot] / #4529)
- @uppy/aws-s3-multipart: add types to internal fields (Antoine du Hamel / #4535)
- examples/aws-nodejs: update README (Antoine du Hamel / #4534)
- examples/aws-nodejs: showcase an example without preflight requests (Antoine du Hamel / #4516)
- @uppy/aws-s3-multipart: fix pause/resume (Antoine du Hamel / #4523)
- @uppy/status-bar: fix ETA when Uppy recovers its state (Antoine du Hamel / #4525)
- @uppy/aws-s3-multipart: fix resume single-chunk multipart uploads (Antoine du Hamel / #4528)
- @uppy/companion: fix part listing in s3 (Antoine du Hamel / #4524)
- example/aws-php: make it forward-compatible with the next Uppy major (Antoine du Hamel / #4522)
- @uppy/golden-retriever: refactor to modernize the codebase (Antoine du Hamel / #4520)
- examples/aws-nodejs: upgrade to AWS-SDK v3 (Antoine du Hamel / #4515)
- @uppy/companion: implement refresh for authentication tokens (Mikael Finstad / #4448)
- @uppy/aws-s3-multipart: disable pause/resume for remote uploads in the UI (Artur Paikin / #4500)
- @uppy/tus: retry on 423 HTTP error code (Antoine du Hamel / #4512)
@schonert
Copy link
Contributor Author

schonert commented Jul 6, 2023

I'm afraid we couldn't have it as the default behavior though

Agree - and I see the headache of documenting how a limit of 5 is not actually a limit of 5.

A way to modify this behavior would have been great tho. Extending the HTTPCommunicationQueue and replacing companionCommunicationQueue required quite a few dirty tricks.

Note that in most cases, chunks are 5MiB large, so you can expect it will take at most the time for your client to upload 5MiB before the complete/abort request is actually sent.

We've gone ahead and increased our chunks to between 50-100mb - depending on the actual size and type of the file. We found an increase in performance with the increased chunk size while calculating the checksum and uploading.

Murderlon added a commit that referenced this pull request Jul 10, 2023
* main: (24 commits)
  Add missing pt-BR locales for ImageEditor plugin (#4558)
  Release: uppy@3.11.0 (#4550)
  @uppy/companion: fix infinite recursion in uploader test (#4536)
  @uppy/xhr-upload: export `Headers` type (#4549)
  @uppy/aws-s3-multipart: increase priority of abort and complete (#4542)
  @uppy/aws-s3: fix remote uploads (#4546)
  meta: use `corepack yarn` instead of `npm` to launch E2E (#4545)
  @uppy/aws-s3-multipart: fix upload retry using an outdated ID (#4544)
  @uppy/status-bar: remove throttled component (#4396)
  @uppy/aws-s3-multipart: fix Golden Retriever integration (#4526)
  examples/aws-nodejs: merge multipart and non-multipart examples (#4521)
  @uppy/companion: bump semver from 7.3.7 to 7.5.3 (#4529)
  @uppy/aws-s3-multipart: add types to internal fields (#4535)
  examples/aws-nodejs: update README (#4534)
  examples/aws-nodejs: showcase an example without preflight requests (#4516)
  @uppy/aws-s3-multipart: fix pause/resume (#4523)
  @uppy/status-bar: fix ETA when Uppy recovers its state (#4525)
  @uppy/aws-s3-multipart: fix resume single-chunk multipart uploads (#4528)
  @uppy/companion: fix part listing in s3 (#4524)
  example/aws-php: make it forward-compatible with the next Uppy major (#4522)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS-S3-Multipart rate limit S3 multipart only showing files in bucket after all upload
3 participants