You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to use localsync through the API to use downloadFolderRecursive with metadata and prevent re-downloading unchanged files as expected. The code shown in girder client is pretty simple :
So in my client I did gc.loadLocalMetadata(local_folder), gc.downloadFolderRecursive(parent_id, local_folder, sync=True) and gc.saveLocalMetadata(local_folder) but it would not work, it kept re-downloading every files evey time, .girder_metadata file present or not.
The problem lies in the definition of gc.downloadFolderRecursive(), in the test for syncing:
defdownloadFolderRecursive(self, folderId, dest, sync=False):
""" Download a folder recursively from Girder into a local directory. :param folderId: Id of the Girder folder or resource path to download. :type folderId: ObjectId or Unix-style path to the resource in Girder. :param dest: The local download destination. :type dest: str :param sync: If True, check if item exists in local metadata cache and skip download provided that metadata is identical. :type sync: bool """offset=0folderId=self._checkResourcePath(folderId)
whileTrue:
folders=self.get('folder', parameters={
'limit': DEFAULT_PAGE_LIMIT,
'offset': offset,
'parentType': 'folder',
'parentId': folderId
})
forfolderinfolders:
local=os.path.join(dest, self.transformFilename(folder['name']))
os.makedirs(local, exist_ok=True)
self.downloadFolderRecursive(folder['_id'], local, sync=sync)
offset+=len(folders)
iflen(folders) <DEFAULT_PAGE_LIMIT:
breakoffset=0whileTrue:
items=self.get('item', parameters={
'folderId': folderId,
'limit': DEFAULT_PAGE_LIMIT,
'offset': offset
})
foriteminitems:
_id=item['_id']
self.incomingMetadata[_id] =item# --------> HERE <--------------------------------------------------------------------ifsyncand_idinself.localMetadataanditem==self.localMetadata[_id]:
continueself.downloadItem(item['_id'], dest, name=item['name'])
offset+=len(items)
iflen(items) <DEFAULT_PAGE_LIMIT:
break
The test item == self.localMetadata[_id] will always fail, since the 'downloadStatistics' metadata field has changed on the server. We would like to check that id, name, last modification time etc. haven't changed but the number of downloads, requests and stuff is causing this test never ever be True.
I ended up recoding the gc.downloadFolderRecursive() method as follows, which works well:
defdownloadFolderRecursive(self, folderId, dest, sync=False):
""" Download a folder recursively from Girder into a local directory. :param folderId: Id of the Girder folder or resource path to download. :type folderId: ObjectId or Unix-style path to the resource in Girder. :param dest: The local download destination. :type dest: str :param sync: If True, check if item exists in local metadata cache and skip download provided that metadata is identical. :type sync: bool """offset=0folderId=self._checkResourcePath(folderId)
whileTrue:
folders=self.get('folder', parameters={
'limit': DEFAULT_PAGE_LIMIT,
'offset': offset,
'parentType': 'folder',
'parentId': folderId
})
forfolderinfolders:
local=os.path.join(dest, self.transformFilename(folder['name']))
os.makedirs(local, exist_ok=True)
self.downloadFolderRecursive(folder['_id'], local, sync=sync)
offset+=len(folders)
iflen(folders) <DEFAULT_PAGE_LIMIT:
breakoffset=0whileTrue:
items=self.get('item', parameters={
'folderId': folderId,
'limit': DEFAULT_PAGE_LIMIT,
'offset': offset
})
foriteminitems:
# --------> THERE <-----------------------------------------------------------------------------------------# remove 'downloadStatistics' field from metadata, which causes to re download everything anywayif'downloadStatistics'initem:
item.pop('downloadStatistics')
if'downloadStatistics'inself._gc.localMetadata:
self._gc.localMetadata.pop('downloadStatistics')
_id=item['_id']
self.incomingMetadata[_id] =itemifsyncand_idinself.localMetadataanditem==self.localMetadata[_id]:
continueself.downloadItem(item['_id'], dest, name=item['name'])
offset+=len(items)
iflen(items) <DEFAULT_PAGE_LIMIT:
break
That way, the 'downloadStatistics' field is also removed from the .girder_matadata local file, as it is of no interest for my use, but keeping it there and yet not using it for the sync test it straight forward.
I hope this helps, I hope it is not a misunderstanding of what localsync is expected to do, don't hesitate to tell me if this is all wrong.
Cheers.
The text was updated successfully, but these errors were encountered:
I believe this is simply incorrect behavior that you've discovered. Rather than removing fields, we should just identify some known set of fields that we can use to indicate change in the item. The updated field is probably the easiest.
Hello there,
I've been trying to use localsync through the API to use downloadFolderRecursive with metadata and prevent re-downloading unchanged files as expected. The code shown in girder client is pretty simple :
So in my client I did
gc.loadLocalMetadata(local_folder)
,gc.downloadFolderRecursive(parent_id, local_folder, sync=True)
andgc.saveLocalMetadata(local_folder)
but it would not work, it kept re-downloading every files evey time, .girder_metadata file present or not.The problem lies in the definition of
gc.downloadFolderRecursive()
, in the test for syncing:The test
item == self.localMetadata[_id]
will always fail, since the 'downloadStatistics' metadata field has changed on the server. We would like to check that id, name, last modification time etc. haven't changed but the number of downloads, requests and stuff is causing this test never ever be True.I ended up recoding the
gc.downloadFolderRecursive()
method as follows, which works well:That way, the 'downloadStatistics' field is also removed from the .girder_matadata local file, as it is of no interest for my use, but keeping it there and yet not using it for the sync test it straight forward.
I hope this helps, I hope it is not a misunderstanding of what localsync is expected to do, don't hesitate to tell me if this is all wrong.
Cheers.
The text was updated successfully, but these errors were encountered: