Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@uppy/aws-s3-multipart Replace functionality lost in v2 by batching with prepareUploadParts #3881

Closed
kevin-west-10x opened this issue Jul 12, 2022 · 7 comments
Assignees
Labels
AWS S3 Plugin that handles uploads to Amazon AWS S3 💬 Discussion Feature

Comments

@kevin-west-10x
Copy link
Contributor

The breaking change in v2 of @uppy/aws-s3-multipart that migrates prepareUploadPart to prepareUploadParts loses some fine grained functionality that in my case is preventing me from updating from v1. I need this functionality in order to calculate md5 hashes of the parts I'm preparing so they can be validated on s3. The two changes I can see that caused an issue are:

  1. No longer passing file data of the part(s) we are preparing. This was fixed in Add chunks back to prepareUploadParts, indexed by partNumber #3520 with help from @Murderlon.
  2. The headers passed to s3 in the return payload are now shared among all of the batched parts. Due to this I can no longer set the Content-MD5 of each individual part.

Being able to batch create the presigned urls isn't a bad option to have, but it would be nice if this functionality could be restored in some way.

@Murderlon
Copy link
Member

Current return value of prepareUploadParts:

{
  "presignedUrls": {
    "1": "https://bucket.region.amazonaws.com/path/to/file.jpg?partNumber=1&...",
    "2": "https://bucket.region.amazonaws.com/path/to/file.jpg?partNumber=2&...",
    "3": "https://bucket.region.amazonaws.com/path/to/file.jpg?partNumber=3&..."
  },
  "headers": { "some-header": "value" }
}

Proposal:

function prepareUploadParts(file, { uploadId, key, parts }) {
  const presignedRequests = parts.map((part) => {
    const { number, chunk } = part;
    const url = `https://bucket.s3.us-east-2.amazonaws.com/test/upload/multitest.dat?partNumber=${number}&...`;
    const hash = calculateMD5(chunk);
    return new Request(url, { headers: { "Content-MD5": hash } });
  });

  return { presignedRequests };
}

Changes:

  • partNumbers and chunks have now been merged into parts (Array<{ number: number, chunk: blob }>). I think they were separate to prevent breaking changes, but they belong together.
  • Return presignedRequests as an array with Request's.
    • It felt a bit off to make it "1": { url, headers }. Also making headers part indexed felt a bit off, but possible I suppose.
    • This way we 'use the platform'
    • Downside is you can set things in Request like the method and perhaps some other things that shouldn't happen. We could leave it up to the implementer to not mess that up, or always override those things.

Curious about your thoughts @mifi @aduh95 @arturi @martin-brennan

@Murderlon Murderlon added 💬 Discussion AWS S3 Plugin that handles uploads to Amazon AWS S3 and removed Triage labels Jul 13, 2022
@Murderlon Murderlon self-assigned this Jul 13, 2022
@Murderlon Murderlon added the 3.0 label Jul 13, 2022
@daghendrik
Copy link

Thanks for moving this forward, @Murderlon and @kevin-west-10x . It will be very useful for us as well.

If possible, it would be awesome to also support the other/additional checksums that AWS now accepts as integrity checks on S3 (https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/)

@Murderlon
Copy link
Member

If possible, it would be awesome to also support the other/additional checksums that AWS now accepts as integrity checks on S3 (https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/)

I'm not immediately sure what it takes for us to support those. Probably better to tackle as a separate issue. Would you be willing to investigate a bit to create an issue with pointers what needs to change in the codebase?

@daghendrik
Copy link

@Murderlon : Sure, added an issue: #3885

@mifi
Copy link
Contributor

mifi commented Jul 16, 2022

sounds good, but should file be part of the object arg? and does it need to return an object?

one concern is maybe some users need to do asynchronous calculations in order to get the hash. so maybe the function should be async?

what about this:

async function prepareUploadParts({ file, uploadId, key, parts }) {
  return parts.map((part) => {
    const { number, chunk } = part;
    const url = `https://bucket.s3.us-east-2.amazonaws.com/test/upload/multitest.dat?partNumber=${number}&...`;
    const hash = calculateMD5(chunk);
    return new Request(url, { headers: { "Content-MD5": hash } });
  });
}

@Murderlon
Copy link
Member

sounds good, but should file be part of the object arg? and does it need to return an object?

We could do that too, yes.

one concern is maybe some users need to do asynchronous calculations in order to get the hash. so maybe the function should be async?

prepareUploadParts is already async and will stay that way.

@Murderlon
Copy link
Member

Closed in #3895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS S3 Plugin that handles uploads to Amazon AWS S3 💬 Discussion Feature
Projects
None yet
Development

No branches or pull requests

4 participants