Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Timed out waiting for socket" when attempting to upload multiple large files via Companion #3640

Closed
thurinus opened this issue Apr 13, 2022 · 16 comments · Fixed by #3797 or #4150
Closed
Assignees
Labels
AWS S3 Plugin that handles uploads to Amazon AWS S3 Bug Tus Resumable file uploading vis tus.io and Uppy Tus plugin

Comments

@thurinus
Copy link

Hi all,

I'm using Uppy/Companion to offer file uploads from remote sources, and am hitting "Timed out waiting for socket connection" reliably in the following use case:

  1. Connect to any remote source (Google Drive or DropBox in my case) that has a bunch of large files. I have 10 video files at 300+MB each.
  2. Attempt to upload all 10 files at the same time.
  3. Notice that 3 files start uploading.
  4. After 60 seconds pass, server reports "Timed out waiting for socket connection" for the other 7 files.
  5. The 3 files that were uploading complete after a while, and progress for the bulk upload hangs. No UI error is reported for the failed 7 files, nor is an option presented to retry.

I have Tus and AwsS3Multipart set up as mutually-exclusive destinations, with both their limit config set to 3. The timeout occurs when either one is used, so it's destination-agnostic.

Debugging

It looks like Companion attempts to assign sockets (apparently limited by the limit config for the destination) to every file the moment they're added to the upload batch, and starts a timeout for those that miss out on the first attempt. If no socket frees up in time (default 60000ms), the timeout error will be thrown on the server for any file still waiting on a socket. This leaves the UI in a seemingly frozen state, with no indication of a problem or offer to retry.

Probable expected behavior

Perhaps socket assignment should be attempted only when they free up? The current approach works for many smaller files, but easily breaks if the files are too large or if the connection's slow at any given time.

What I've tried

  • Leaving the limit unset on the destination configs would allow all files to be assigned sockets immediately, but this workaround is not desirable, since the limit was placed to keep upload rates manageable.
  • Setting streamingUpload to true based on the similarity of the problem described in Uploading from Google Drive hangs after a while #3098 didn't fix the problem.

Thanks for any help!

@mifi
Copy link
Contributor

mifi commented Apr 19, 2022

Hi! Actually we recently introduced that timeout to prevent memory leaks. Are you using self-hosted companion or Transloadit hosted?

If you are using self-hosted, could you try to increase clientSocketConnectTimeout or env variable COMPANION_CLIENT_SOCKET_CONNECT_TIMEOUT to something higher like say 30 minutes and see if that helps?

Or maybe we should prevent Uppy from sending any POST requests in the first place, unless it can also start a socket connection at the same time

@thurinus
Copy link
Author

thurinus commented Apr 19, 2022

Thanks for picking this up mifi!

We're currently self-hosted. I can try increasing clientSocketConnectTimeout to alleviate the problem some for now, but that isn't very scalable since it'd still break if files are large enough or connection slow enough to elapse more than 30 mins.

EDIT: I tried upping the clientSocketConnectionTimeout to 30mins, and I got this at 90 seconds:

RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

Holding off the requests until sockets are available sounds like a better approach. 3rd party sources probably wouldn't like having many connections sitting idly anyway.

How does Transloadit handle this use case? Do you limit file/batch sizes, file counts, etc?

@thurinus
Copy link
Author

thurinus commented May 5, 2022

Hi @mifi, are there plans to modify this behavior? Thanks for any update!

@mifi
Copy link
Contributor

mifi commented May 5, 2022

Thanks for your research into this. I think what is happening is that once /:providerName/get/:id is called from Uppy, companion will keep the connection to the upstream provider open, but the provider will time out the request after 90 sec due to no data flowing yet (data will not start flowing until the uppy client connects to the socket).

More specifically:

  1. Press the upload button in the UI for 10 files from google drive
  2. Uppy calls /:providerName/get/:id on all 10 files instantly
  3. Companion will initiate all 10 uploads from google drive and leave the 10 connections towards google drive open
  4. uppy will send websocket requests for 3 files to companion
  5. companion will start streaming data from 3 of the open connection to google drive
  6. these 3 files take over 90 second to download/upload
  7. the 7 pending google drive connections from companion time out with RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed
  8. the 3 files complete but the rest have failed

On a side note I find it odd that you're getting the error RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. from Google Drive. From my googling that error seems to be an S3 specific error. I don't know why Google Drive would be using a competitor AWS's service. Or maybe it's dropbox reporting this error?

These observations still make me think that this issue should be fixed in the Uppy client by making Uppy only send the /:providerName/get/:id once it is actually finished with the previous files in the queue. I didn't work much on the rate limiting but I'll have a look. Maybe @Murderlon or @aduh95 can provide some insights.

@rcunning
Copy link

Hi @mifi - thanks for trying to rally around getting this issue fixed. Unfortunately, we need companion working soon for this case or we are going to have to move off Uppy for these crossload services. Our use case is crossloading large video files, and several of them. This used to work just fine in uppy+companion 2.4.0 - but became broken in this state when we upgraded to 3.5.0 (not sure about any of the 3.x). It is not an option for us to roll back to 2.x.

One thing I noticed is the same issue happens with the same error on the example demo - however, I also noticed a note in that example:
image I am assuming that this note is meant just for the example, and not for uppy+companion in general. If not, please let us know.

Do you have an estimate of when this issue might get addressed? I have got to assume that crossloading many large files is a core use case for Uppy. Again, if not, let us know.

@Murderlon
Copy link
Member

@rcunning just communicated this with the team, we'll take a look. It's not immediately obvious to me where the fix is so I can't give an estimate as of now but tackling it this week.

I am assuming that this note is meant just for the example, and not for uppy+companion in general

Yes, it's a note to pair with optional restrictions.

@rcunning
Copy link

@Murderlon Thank you for responding. Digging in the code a little, it appears that if we limited concurrency here it would solve the problem. Each time that .download() is called it opens a socket. However, I think the best solution would allow you to set a max concurrent downloads from each provider, not sure about if it needs to be per user or global for the process.

@rcunning
Copy link

Hi @Murderlon - any movement on this? I did a little experimentation and it appears that the drive and dropbox apis both limit concurrency by user to 3. It seems a fix in upload.js to limit concurrency to 3 for each req.companion.providerToken should solve the problem, ideally with configurable concurrency by provider. I suspect box and onedrive have a similar limitation, although I did not test those ones.

@Murderlon
Copy link
Member

From my understanding it's not as easy. We have exponential backoff on the client which uses the client's limit. The websockets go through the rate limiter, hence only (for instance) 3 requests will be used. But Uppy did open 10 web socket connections beforehand. AFAIK the fix is establishing the websocket connection on the client with respect to the rate limiter.

@Murderlon
Copy link
Member

Just did a pairing session with @arturi and @aduh95 but we couldn't get it to work.

We request a server token for the websocket connection instantly here:

const res = await client.post(file.remote.url, {
...file.remote.body,
endpoint: opts.endpoint,
uploadUrl: opts.uploadUrl,
protocol: 'tus',
size: file.data.size,
headers: opts.headers,
metadata: file.meta,
})
this.uppy.setFileState(file.id, { serverToken: res.token })
await this.connectToServerSocket(this.uppy.getFile(file.id))

But in connectToServerSocket we actually use the socket with the rate limiter:

queuedRequest = this.requests.run(() => {
socket.open()
if (file.isPaused) {
socket.send('pause', {})
}

We can't define the socket connection inside the rate limiter, because it returns a queuedRequest after it's done which is needed for the events (such as .on('error)) to work. A bit of a chicken/egg problem, what comes first. We tried putting the post for the server token in the rate limiter too but that didn't work.

For now, the best way to deal with this is setting a very high timeout for sockets. This requires a different approach which we still need to figure out. Simply limiting to 3 as per your suggestion on the server only works if the client coincidentally has the same limit, which is not a good solution.

@Murderlon Murderlon self-assigned this May 25, 2022
@Murderlon Murderlon added AWS S3 Plugin that handles uploads to Amazon AWS S3 Tus Resumable file uploading vis tus.io and Uppy Tus plugin labels May 25, 2022
@rcunning
Copy link

@Murderlon Ok, thanks for the update.... indeed, a bit more complex than I thought. I appreciate you trying to solve it the right way.

We went ahead and implemented a band-aid patch to limit concurrency by req.companion.providerToken, which seems to be working well. However, we would love the issue solved correctly so we can stay up to date and not have to be patching.

@mifi
Copy link
Contributor

mifi commented May 27, 2022

I tried to find a solution, but I'm struggling to understand the rate limiter code and how it works with promises.
I think what we need to do is to somehow put client.post inside the rate limited operation, inside connectToServerSocket (this.requests.run(() => ...), but before the socket gets created. Because this code is duplicated in multiple times, we need to implement this change in:

  • aws-s3
  • aws-s3-multipart
  • tus

I think setting a very high timeout for sockets is not a viable solution as demonstrated above, because the connection to google drive will be terminated by them after 90 seconds of no activity, and probably similar for other providers

@Murderlon
Copy link
Member

I think what we need to do is to somehow put client.post inside the rate limited operation

This is what we tried but it didn't work and it makes things conceptually very hard to grasp/maintain. But there might not be another solution so we would have to try to make it work I suppose

@lavisht22
Copy link

I am still facing this issue. I am on latest version of uppy trying to upload 1000+ images from google drive to s3. When I set the limit to 20, the upload doesn't go beyond 15% but when I set it to 3, the uploads sort of hangs around 80%. I am not able to upload entire dataset at once. Any help would be really appreciated.

Example logs on companion server:

companion: 2022-10-12T08:13:32.178Z [error]  Error: Timed out waiting for socket connection
at Timeout.<anonymous> (/var/app/current/node_modules/@uppy/companion/lib/server/Uploader.js:387:32)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7)

@mifi mifi reopened this Oct 13, 2022
@aduh95
Copy link
Member

aduh95 commented Oct 13, 2022

@lavisht22 what versions of Uppy and Companion are you using?

@lavisht22
Copy link

@lavisht22 what versions of Uppy and Companion are you using?

@aduh95 I am using "@uppy/companion": "^4.0.3", and "@uppy/core": "^3.0.2",.

@Murderlon Murderlon assigned aduh95 and unassigned mifi Oct 17, 2022
aduh95 added a commit to aduh95/uppy that referenced this issue Oct 18, 2022
aduh95 added a commit to aduh95/uppy that referenced this issue Oct 19, 2022
Fixes: transloadit#3640
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
aduh95 added a commit that referenced this issue Oct 19, 2022
Fixes: #3640
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
aduh95 added a commit that referenced this issue Oct 19, 2022
Fixes: #3640
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
(cherry picked from commit f752855)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS S3 Plugin that handles uploads to Amazon AWS S3 Bug Tus Resumable file uploading vis tus.io and Uppy Tus plugin
Projects
None yet
6 participants