Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Companion: Use GET instead of HEAD for getURLMeta + Cut off length of file names #3048

Merged
merged 11 commits into from Sep 30, 2021

Conversation

mifi
Copy link
Contributor

@mifi mifi commented Jul 26, 2021

Fixes #3034

also rewrite to async/await

mifi and others added 6 commits July 29, 2021 20:36
Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
and abort request immediately upon response headers received
#3034 (comment)
or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3
@mifi
Copy link
Contributor Author

mifi commented Aug 30, 2021

I have now completely changed strategy in this PR:

1. Change getURLMeta to use GET instead of HEAD

will now use a GET and then immediately abort the request once headers have been received, as discussed in #3034

2. Cut off length of file names (tus meta)

Or else we get and error from tus / S3 when uploading long URL's (e.g. signed S3 URLs).
Example url:

https://s3.eu-west-1.amazonaws.com/mifi.no/android-chrome-192x192.png?response-content-disposition=attachment&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEG8aDmFwLXNvdXRoZWFzdC0xIkcwRQIgBxJXjwpFbnwa2fD7RsxR1c9NLZhcjhh%2F2yHbpN2LvkwCIQC0X%2FKGhg9RED3QJQ%2F8Cm7JOzMAE%2FlYSq6Uh9xWxi48YSqwAgio%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAMaDDY3NjAxNjYzNDg0NCIMnUg5bmuUbivS7VsKKoQCVqJcw1QFNDxpjPClS9MsbyaIJN6iaDWzfalYC0T2mrtM2MDEKsEA8YecMjx17SPw9GmQRmh1ouC2%2FGGzHcykUWd8F9uZJjoVbixAR25yqHBLS3iLzLPuu5vU%2B%2FDCZ78fa39QQUri%2BUYFyFXA2p2MS2RQxqZWJx29z%2B567jaduYofifV9eghhZn%2FwFRlJcu%2FdPVtf57yEzwwAAkyk4iDbaelUljFuIobOkcMDnTitSGfuNBi3QVzGjRWiCOiAUt7pOUSM3NMKuZm4wpVR80y%2FxDifnN9UTg8hxs5MyB8jL1utHYPFVo6sSL9vmC8NoOsbKG8%2FDy%2F1QEWi9k7SdkyY6Qjxmrgwib%2BuiQY63wEzxT7tZ75xzKc%2BbOatA0W9TA6ND2HXVB%2F%2FlW3HZpgSNX5AkTMWSow5iTox5RpMgfs%2B6tljxOWVpkzo1KlWxXZ2c2mbRTJ6glVJhDAirSdUPVEmOUuwqY9PiH8wEPwurdYkkP5hpdPrF6qPpKDXFIfnY%2B7F1r4w5smBtT65gJBuTACfZmPElnkGorkJfgib8pkb4VOv7stY%2FMqfHiydWKC4EaG%2F5CPJOSCBQ2de7Uuz82ue0pIcDLMyQVfuhtkd6lV6FZZnP%2FQHtDA1WZlAowZLorXSIm4cxx7mWU9VlUJs&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210829T161548Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAZ2ZN3HPONWUGXSFY%2F20210829%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Signature=541ba4723f4f830c2381c2bb89e309e5322823db9912821d4a5eb1e3e8f87405

This URL is 1.3kb long, and gives this error from companion (coming from tus/s3):

companion: 2021-08-29T16:18:53.652Z [error] uploader.tus.error Error: tus: unexpected response while creating upload, originated from request (method: POST, url: https://tusd.tusdemo.net/files/, response code: 500, response text: s3store: unable to create multipart upload:
MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size
	status code: 400, request id: NHHB3ZBQ68BB8NKY, host id: XumFy5PZIT26MJUQCroOkhEdFhfwVCU/BcHirFPafqFhSDM68hsazowBUETEGCHxZriQxSw4a58=
, request id: 2eeaf7f4-471f-417a-a960-2031ae86e777)

According to S3 documentation max length for the whole metadata object is 2kb: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html

So I cut off file name metadata to 500 bytes in this PR. After cutting off filename we don't get this error anymore.

@mifi mifi requested review from aduh95 and Murderlon August 30, 2021 14:44
@aduh95
Copy link
Member

aduh95 commented Aug 30, 2021

Not sure the async/await improves the readability here, given the fact that's only one await. Rest of the PR looks good to me.

@mifi
Copy link
Contributor Author

mifi commented Aug 30, 2021

IMO the advantages of async/await are:

  • it's easier to add more async calls in the future if needed
  • if we in the future start to use async functions as route handlers in express, then we can just remove the async IIFE and remove the try/catch block and let express handle errors automatically.
  • I think reading sequential code is easier than callbacked code, and if an error is thrown inside a callback, it could easily become an unhandled rejection and the request will hang forever

@mifi mifi changed the title WIP: Companion: only send HEAD request if size is unknown Companion: Use GET instead of HEAD for getURLMeta + Cut off length of file names Aug 30, 2021
@aduh95
Copy link
Member

aduh95 commented Aug 31, 2021

  • I think reading sequential code is easier than callbacked code, and if an error is thrown inside a callback, it could easily become an unhandled rejection and the request will hang forever

I agree that sequential code is easier to read, but I don't think it applies here: The previous code was not using a callback API, but a Promise one. Also it was sequential, and wasn't using an IIFE.
Also, if an error inside an async function, it could as easily become an unhandled rejection and the request will also hang forever, I don't see how it's an improvement.

Counter proposal: make the get function async. IIRC Express doesn't care if we return a Promise or anything in its callback, so we can skip the IIFE.

@mifi
Copy link
Contributor Author

mifi commented Aug 31, 2021

I agree that sequential code is easier to read, but I don't think it applies here: The previous code was not using a callback API, but a Promise one. Also it was sequential, and wasn't using an IIFE.

the previous code did indeed have a callback, inside .then:

  reqUtil.getURLMeta(req.body.url, !debug)
    .then(({ size }) => {

I think avoiding nesting is generally preferrable, but I agree that in this case the added IIFE also gives more nesting so my change doesn't give much advantage.

Also, if an error inside an async function, it could as easily become an unhandled rejection and the request will also hang forever, I don't see how it's an improvement.

Agree. I'm just so used to using express-async-handler with express, and in that case any async rejections will be caught automatically and responded.

Counter proposal: make the get function async. IIRC Express doesn't care if we return a Promise or anything in its callback, so we can skip the IIFE.

Good idea, I'll do that.

Comment on lines -33 to +34
expect(time).toBeGreaterThanOrEqual(50)
expect(time).toBeLessThan(100)
expect(time).toBeGreaterThanOrEqual(30)
expect(time).toBeLessThan(70)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh using Date API is probably not a great idea for this kind of tests, it could use a refactor to performance.now(). But a change for another time, it should not block this PR.

@mifi mifi merged commit 876f833 into main Sep 30, 2021
@mifi mifi deleted the companion-optional-head branch September 30, 2021 10:27
HeavenFox pushed a commit to docsend/uppy that referenced this pull request Jun 27, 2023
… file names (transloadit#3048)

* rewrite to async/await

* Only fetch size (HEAD) if needed transloadit#3034

* Update packages/@uppy/companion/src/server/controllers/url.js

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

* Change HEAD to GET in getURLMeta

and abort request immediately upon response headers received
transloadit#3034 (comment)

* fix lint

* fix lint

* cut off length of file names

or else we get
"MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3

* try to fix flaky test

* remove iife and cleanup code a bit

* fix lint by reordering code

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Companion's URL controller should avoid getting the URL metadata if it already knows the file size
2 participants