Move attached files between assetstores #3467

manthey · 2023-09-15T19:12:54Z

We have a function to move files between assetstores. This doesn't support moving attached files, only files that are actually owned directly by an item.

Probably the correct behavior is to modify the upload record in moveFileToAssetstore to reflect the attached status.

Further, there is a major performance penalty in the movement of files because we build up the chunk to upload by repeatedly adding data blocks from the small downloaded chunks. We should, instead, keep a list of these chunks and join them once for upload rather than iteratively adding binary string together.

The text was updated successfully, but these errors were encountered:

willdunklin · 2023-09-18T17:29:20Z

To quote my comment from the import-tracker plugin PR (DigitalSlideArchive/import-tracker#19 (comment))

That looks good, I've been able to essentially recreate a version of moveFileToAssetstore for attached files. Using the lower level APIs there also expose control over the file metadata so maintaining the created field is automatically encapsulated (by the upload.update(...) line).

def move_meta_file(file, assetstore):
    parent = Item().findOne({'_id': file['attachedToId']})

    chunk = None
    try:
        for data in File().download(file, headers=False)():
            if chunk is not None:
                chunk += data
            else:
                chunk = data
    except Exception as e:
        return {'error': f'Exception downloading file: {e}'}

    upload = Upload().uploadFromFile(
        obj=io.BytesIO(chunk), size=file['size'], name=file['name'],
        parentType=file['attachedToType'], parent=parent,
        mimeType=file['mimeType'], attachParent=True, assetstore=assetstore)
    upload.update({k: v for k, v in file.items() if k != 'assetstoreId'})
    upload = File().save(upload)

    return upload

This is modeled after the large image plugin's code for manipulating attached files https://github.com/girder/large_image/blob/master/girder/girder_large_image/rest/tiles.py#L1538-L1547

I believe this version addresses the chunking issue you mentioned, correct? Or am I misinterpreting the bottleneck at hand here?

manthey · 2023-09-18T17:53:38Z

The chunking issue is the add in this loop:

    chunk = None
    try:
        for data in File().download(file, headers=False)():
            if chunk is not None:
                chunk += data
            else:
                chunk = data

specifically, the download iterator gets something on the order of 64kb at at time. We don't want to get the whole file before uploading (since it could be arbitrarily large). But in the girder code we default to an upload chunk size of 32 Mb, which means we might add chunks 512 times. Python creates a new memory object each and every time that occurs and does a memory copy (and is slow about it). Better would be to collect the chunks in a list:

    chunks = []
    for data in File().download(file, headers=False)():
            chunks.append(data)
            if sum(len(chunk) for chunk in chunks) >= chunkSize:
                uploadchunk = b''.join(chunks)
                upload = self.handleChunk(upload, RequestBodyStream(io.BytesIO(uploadchunk), len(uploadchunk)))
                progress.update(increment=len(uploadchunk))
                chunks = []

        if len(chunks):
            uploadchunk = b''.join(chunks)
            upload = self.handleChunk(upload, RequestBodyStream(io.BytesIO(uploadchunk), len(uploadchunk)))
            progress.update(increment=len(uploadchunk))

Though that repeated code bothers me, and maybe it'd be better to keep a tally of lengths rather than call sum repeatedly.

willdunklin · 2023-09-19T14:29:46Z

Got it, I made a PR with a modified version of your code for the speedup. It seems to perform quite well in comparison. I'm working on sorting out attached files on top of the listed changes

manthey mentioned this issue Sep 15, 2023

Add /folder/{id}/move API endpoint DigitalSlideArchive/import-tracker#19

Merged

willdunklin mentioned this issue Sep 18, 2023

Improve Upload assetstore move speed #3469

Merged

willdunklin mentioned this issue Sep 21, 2023

Move attached files between assetstores #3470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move attached files between assetstores #3467

Move attached files between assetstores #3467

manthey commented Sep 15, 2023

willdunklin commented Sep 18, 2023

manthey commented Sep 18, 2023

willdunklin commented Sep 19, 2023

Move attached files between assetstores #3467

Move attached files between assetstores #3467

Comments

manthey commented Sep 15, 2023

willdunklin commented Sep 18, 2023

manthey commented Sep 18, 2023

willdunklin commented Sep 19, 2023