Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tus resumes unrelated uploads, causing conflicts or lost metadata #5019

Closed
2 tasks done
defnull opened this issue Mar 25, 2024 · 6 comments · Fixed by #5080
Closed
2 tasks done

Tus resumes unrelated uploads, causing conflicts or lost metadata #5019

defnull opened this issue Mar 25, 2024 · 6 comments · Fixed by #5080
Labels

Comments

@defnull
Copy link

defnull commented Mar 25, 2024

Initial checklist

  • I understand this is a bug report and questions should be posted in the Community Forum
  • I searched issues and couldn’t find anything (or linked relevant results below)

Link to runnable example

No response

Steps to reproduce

1.Create two Uppy instances (A and B) with different id and meta settings, but the same Tus endpoint. Those instances should be on the same origin domain, but can be on different pages.
2. Upload a file to A.
3. Upload the same file to B.

Or simply upload the same file twice, but with different metadata the second time.

Expected behavior

The second upload should trigger a new Tus upload with corresponding metadata. For example, by taking the id of the Uppy instance and the upload metadata into account when calculating the Tus fingerprint. Alternatively, there should be a hook to control either the File.id or the Tus fingerprint for new files.

Actual behavior

The second upload will have the same File.id (and thus Tus fingerprint) as the first upload. As a result, tus-js-client will immediately report the file as successfully uploaded and Uppy will emit an upload-success event. In the back-end however only one tus file exists and that file has the metadata of the first upload. This breaks applications that need to support duplicate uploads, for example to allow the same file being uploaded to different folders or with different metadata.

The onBeforeFileAdded hook cannot be used to change the File.id as other code that depends on it (e.g. Golden Retriever isGhost detection) runs before that hook. The Uppy ID is not part of the File.id, so two different Uppy instances will create the same file IDs and conflict with each other.

The Uppy Tus uploader overrides the fingerprint option passed to tus-js-client, and the hard-coded implementation only takes File.id and Tus.options.endpoint into account. There is currently no way to change the fingerprint used to detect previous uploads and allow duplicate uploads with the Tus uploader.

@defnull defnull added the Bug label Mar 25, 2024
@Murderlon
Copy link
Member

Hi, did you try to simply pass your own fingerprint option? From the docs:

All options are passed to tus-js-client and we document the ones here that are required, added, or changed. This means you can also pass functions like onAfterResponse.
We recommended taking a look at the API reference from tus-js-client to know what is supported.


This breaks applications that need to support duplicate uploads, for example to allow the same file being uploaded to different folders or with different metadata.

To me it sounds a lot better to handle duplicate uploads on the server and prevent all your users from having to upload everything twice. That's double their bandwidth and more chances of it going wrong or out of sync. With tus this should be trivial with hooks and events, which our official servers have.

@defnull
Copy link
Author

defnull commented Mar 25, 2024

Hi, did you try to simply pass your own fingerprint option?

Yes. It is unconditionally overwritten by the Uppy Tus uploader plugin, so setting that option for the plugin has no effect. See

uploadOptions.fingerprint = getFingerprint(file)

This breaks applications that need to support duplicate uploads, for example to allow the same file being uploaded to different folders or with different metadata.

To me it sounds a lot better to handle duplicate uploads on the server and prevent all your users from having to upload everything twice. That's double their bandwidth and more chances of it going wrong or out of sync. With tus this should be trivial with hooks and events, which our official servers have.

Currently the back-end controls uploads via tusd emitted events and trusts those events. Metadata is used to decide in which 'bucket' a file is uploaded, and this has to happen before the first byte is transferred. So, two uploads with different 'bucket' metadata absolutely have to trigger two Tus uploads. This is why I would like to add this information to File.id (or at least Tus fingerprint).

@defnull
Copy link
Author

defnull commented Mar 25, 2024

This is not just about this particular use-case by the way, it may affect other applications that depend on metadata in Tus uploads as well. Tus supports metadata natively and accepts and persists it along with the upload binary data. Metadata is part of the upload artifact. Files with different metadata should not be considered equal and silently skipped, or there should at least be a way to control which metadata is part of an upload identity (e.g. by overriding Tus fingerprints or the generation of File.id).

@defnull
Copy link
Author

defnull commented Mar 26, 2024

I experimented a bit and think that it would not be enough to be able to control the Tus fingerprint, because other components (most notably GoldenRetriever) also assume that File.id is a globally unique identifier. So, to avoid conflicts when using multiple Uppy instances on the same domain with different configuration, the getSafeFileId function should take the Uppy instance ID into account by default. To fully support duplicate uploads, users should be able to replace or extend the ID generation on a per-Instance basis.

I solved this for myself by replacing a single line:

diff --git a/packages/@uppy/core/src/Uppy.ts b/packages/@uppy/core/src/Uppy.ts
index f813b2242..502ba08fb 100644
--- a/packages/@uppy/core/src/Uppy.ts
+++ b/packages/@uppy/core/src/Uppy.ts
@@ -911,7 +911,7 @@ export class Uppy<M extends Meta, B extends Body> {
     const fileType = getFileType(file)
     const fileName = getFileName(fileType, file)
     const fileExtension = getFileNameAndExtension(fileName).extension
-    const id = getSafeFileId(file)
+    const id = getSafeFileId(file).replace("uppy-", `uppy-${this.getID()}`)
 
     const meta = file.meta || {}
     meta.name = fileName

This is obviously not a real solution, just a hotfix. A better way would be to have a configurable hook that either adds to, or completely replaces the ID generated by getSafeFileId.

@Murderlon
Copy link
Member

@Acconut what do you think of this?

@Acconut
Copy link
Member

Acconut commented Apr 8, 2024

In an ideal world, duplicate uploads of the same file would be avoided, but in practice this is not always easily possible, so I think this should be addressed. The easiest (and sensible) solution seems to be adding the uppy ID into the file ID. tus-js-client puts the fingerprints into the global localStorage, where collisions can appear leading to the described behavior. If we can embed information about the corresponding uppy instance into the file ID and thus fingerprint, this should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants