Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel upload doesn't work on Windows (gets stuck) #1321

Open
ZhuJiaBCM opened this issue Aug 17, 2023 · 8 comments
Open

Parallel upload doesn't work on Windows (gets stuck) #1321

ZhuJiaBCM opened this issue Aug 17, 2023 · 8 comments
Labels
os-windows Issues which are specific to MS Windows OS

Comments

@ZhuJiaBCM
Copy link

Hello, I had problems with uploading .nwb files to my dandiset. All files passed validation; no error showed up. But the uploading get stuck for days. It's not reasonable for small amount of testing data (~6Gb). Can someone please help. Thanks.
Jia Zhu (jiaz@bcm.edu)

@ZhuJiaBCM
Copy link
Author

Any data transfer experts in support team?

@CodyCBakerPhD
Copy link
Contributor

I can finally see the issue 😅

I believe this turned out to be an issue with parallel uploading (which is the default), correct? If --jobs was set to 1 then upload did not get stuck?

@ZhuJiaBCM
Copy link
Author

ZhuJiaBCM commented Aug 26, 2023 via email

@yarikoptic
Copy link
Member

could you please check whenever you get a chance if running parallel uploads after setting env var DANDI_CACHE=ignore ?

@yarikoptic yarikoptic changed the title Long timeout on dandni upload Parallel upload doesn't work on Windows (gets stuck) Oct 20, 2023
@yarikoptic yarikoptic added the os-windows Issues which are specific to MS Windows OS label Oct 20, 2023
@yarikoptic
Copy link
Member

Dear @ZhuJiaBCM , could you please update us -- did you have a chance to try upload "in parallel" but with DANDI_CACHE=ignore set?

@ZhuJiaBCM
Copy link
Author

ZhuJiaBCM commented Feb 13, 2024 via email

@yarikoptic
Copy link
Member

yarikoptic commented Feb 14, 2024

Thank you @ZhuJiaBCM . Do you have log files from those stuck and normal sessions to share (may be privately just in case, can be emailed to debian AT oneukrainian DOT com)?

edit: for now also added con/fscacher#90

@yarikoptic
Copy link
Member

Thank you for sharing the logs!

Looking at them, in the "stuck" case I see paths in ...\out\... folder outside of the dataset folder

❯ grep 'function get_digest' *stuck.log | sed -e 's,.* ,,g'
conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL105_20221106_allData.nwb
conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL105/sub-BAYLORNL105_ses-20221106_ecephys+ogen.nwb')
...
that is the ones for which we got to use cached value(s) but never even progressed to upload
❯ grep 'function get_digest' *stuck.log | sed -e 's,.*[\\/],,g' -e "s,'.*,,g" -e "s,\r,,g" | while read f; do echo -e "\nFILE <$f>"; grep -e "get_digest.*$f" *stuck.log; grep -e "$f.*ing upload" *stuck.log; done

FILE <BAYLORNL105_20221106_allData.nwb>
2024-02-13T12:03:04-0600 [DEBUG   ] fscacher.cache 17620:20688 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL105_20221106_allData.nwb

FILE <sub-BAYLORNL105_ses-20221106_ecephys+ogen.nwb>
2024-02-13T12:03:04-0600 [DEBUG   ] fscacher.cache 17620:20688 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL105/sub-BAYLORNL105_ses-20221106_ecephys+ogen.nwb')
2024-02-13T12:03:22-0600 [DEBUG   ] dandi 17620:20688 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221106_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:06:22-0600 [DEBUG   ] dandi 17620:20688 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221106_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL105_20221105_allData.nwb>
2024-02-13T12:03:08-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL105_20221105_allData.nwb

FILE <sub-BAYLORNL105_ses-20221105_ecephys+ogen.nwb>
2024-02-13T12:03:08-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL105/sub-BAYLORNL105_ses-20221105_ecephys+ogen.nwb')
2024-02-13T12:03:23-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221105_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:06:04-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221105_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL105_20221103_allData.nwb>
2024-02-13T12:03:09-0600 [DEBUG   ] fscacher.cache 17620:27812 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL105_20221103_allData.nwb

FILE <sub-BAYLORNL105_ses-20221103_ecephys+ogen.nwb>
2024-02-13T12:03:09-0600 [DEBUG   ] fscacher.cache 17620:27812 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL105/sub-BAYLORNL105_ses-20221103_ecephys+ogen.nwb')
2024-02-13T12:03:23-0600 [DEBUG   ] dandi 17620:27812 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221103_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:06:21-0600 [DEBUG   ] dandi 17620:27812 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221103_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL106_20221103_allData.nwb>
2024-02-13T12:03:13-0600 [DEBUG   ] fscacher.cache 17620:25712 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL106_20221103_allData.nwb

FILE <sub-BAYLORNL106_ses-20221103_ecephys+ogen.nwb>
2024-02-13T12:03:13-0600 [DEBUG   ] fscacher.cache 17620:25712 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL106/sub-BAYLORNL106_ses-20221103_ecephys+ogen.nwb')
2024-02-13T12:03:30-0600 [DEBUG   ] dandi 17620:25712 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221103_ecephys+ogen.nwb: Beginning upload

FILE <BAYLORNL105_20221102_allData.nwb>
2024-02-13T12:03:13-0600 [DEBUG   ] fscacher.cache 17620:1616 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL105_20221102_allData.nwb

FILE <sub-BAYLORNL105_ses-20221102_ecephys+ogen.nwb>
2024-02-13T12:03:13-0600 [DEBUG   ] fscacher.cache 17620:1616 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL105/sub-BAYLORNL105_ses-20221102_ecephys+ogen.nwb')
2024-02-13T12:03:32-0600 [DEBUG   ] dandi 17620:1616 sub-BAYLORNL105/sub-BAYLORNL105_ses-20221102_ecephys+ogen.nwb: Beginning upload

FILE <BAYLORNL106_20221104_allData.nwb>
2024-02-13T12:06:11-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL106_20221104_allData.nwb

FILE <sub-BAYLORNL106_ses-20221104_ecephys+ogen.nwb>
2024-02-13T12:06:11-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL106/sub-BAYLORNL106_ses-20221104_ecephys+ogen.nwb')
2024-02-13T12:06:20-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221104_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:07:58-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221104_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL106_20221106_allData.nwb>
2024-02-13T12:06:29-0600 [DEBUG   ] fscacher.cache 17620:20688 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL106_20221106_allData.nwb

FILE <sub-BAYLORNL106_ses-20221106_ecephys+ogen.nwb>
2024-02-13T12:06:29-0600 [DEBUG   ] fscacher.cache 17620:20688 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL106/sub-BAYLORNL106_ses-20221106_ecephys+ogen.nwb')
2024-02-13T12:06:41-0600 [DEBUG   ] dandi 17620:20688 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221106_ecephys+ogen.nwb: Beginning upload

FILE <BAYLORNL106_20221105_allData.nwb>
2024-02-13T12:06:30-0600 [DEBUG   ] fscacher.cache 17620:27812 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL106_20221105_allData.nwb

FILE <sub-BAYLORNL106_ses-20221105_ecephys+ogen.nwb>
2024-02-13T12:06:30-0600 [DEBUG   ] fscacher.cache 17620:27812 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL106/sub-BAYLORNL106_ses-20221105_ecephys+ogen.nwb')
2024-02-13T12:06:45-0600 [DEBUG   ] dandi 17620:27812 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221105_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:08:43-0600 [DEBUG   ] dandi 17620:27812 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221105_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL106_20221107_allData.nwb>
2024-02-13T12:08:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL106_20221107_allData.nwb

FILE <sub-BAYLORNL106_ses-20221107_ecephys+ogen.nwb>
2024-02-13T12:08:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL106/sub-BAYLORNL106_ses-20221107_ecephys+ogen.nwb')
2024-02-13T12:08:49-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221107_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:09:57-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL106/sub-BAYLORNL106_ses-20221107_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL107_20230716_allData.nwb>
2024-02-13T12:08:47-0600 [DEBUG   ] fscacher.cache 17620:27812 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL107_20230716_allData.nwb

FILE <sub-BAYLORNL107_ses-20230716_ecephys+ogen.nwb>
2024-02-13T12:08:47-0600 [DEBUG   ] fscacher.cache 17620:27812 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL107/sub-BAYLORNL107_ses-20230716_ecephys+ogen.nwb')
2024-02-13T12:08:55-0600 [DEBUG   ] dandi 17620:27812 sub-BAYLORNL107/sub-BAYLORNL107_ses-20230716_ecephys+ogen.nwb: Beginning upload

FILE <BAYLORNL107_20230717_allData.nwb>
2024-02-13T12:10:00-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL107_20230717_allData.nwb

FILE <sub-BAYLORNL107_ses-20230717_ecephys+ogen.nwb>
2024-02-13T12:10:00-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL107/sub-BAYLORNL107_ses-20230717_ecephys+ogen.nwb')
2024-02-13T12:10:07-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL107/sub-BAYLORNL107_ses-20230717_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:11:23-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL107/sub-BAYLORNL107_ses-20230717_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL107_20230718_allData.nwb>
2024-02-13T12:11:27-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL107_20230718_allData.nwb

FILE <sub-BAYLORNL107_ses-20230718_ecephys+ogen.nwb>
2024-02-13T12:11:27-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL107/sub-BAYLORNL107_ses-20230718_ecephys+ogen.nwb')
2024-02-13T12:11:34-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL107/sub-BAYLORNL107_ses-20230718_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:13:04-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL107/sub-BAYLORNL107_ses-20230718_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL109_20230712_allData.nwb>
2024-02-13T12:13:09-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL109_20230712_allData.nwb

FILE <sub-BAYLORNL109_ses-20230712_ecephys+ogen.nwb>
2024-02-13T12:13:09-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL109/sub-BAYLORNL109_ses-20230712_ecephys+ogen.nwb')
2024-02-13T12:13:19-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL109/sub-BAYLORNL109_ses-20230712_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:14:36-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL109/sub-BAYLORNL109_ses-20230712_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL109_20230714_allData.nwb>
2024-02-13T12:14:42-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL109_20230714_allData.nwb

FILE <sub-BAYLORNL109_ses-20230714_ecephys+ogen.nwb>
2024-02-13T12:14:42-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL109/sub-BAYLORNL109_ses-20230714_ecephys+ogen.nwb')
2024-02-13T12:14:50-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL109/sub-BAYLORNL109_ses-20230714_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:15:57-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL109/sub-BAYLORNL109_ses-20230714_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL112_20230628_allData.nwb>
2024-02-13T12:16:07-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL112_20230628_allData.nwb

FILE <sub-BAYLORNL112_ses-20230628_ecephys+ogen.nwb>
2024-02-13T12:16:07-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL112/sub-BAYLORNL112_ses-20230628_ecephys+ogen.nwb')
2024-02-13T12:16:23-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230628_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:18:32-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230628_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL112_20230701_allData.nwb>
2024-02-13T12:18:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL112_20230701_allData.nwb

FILE <sub-BAYLORNL112_ses-20230701_ecephys+ogen.nwb>
2024-02-13T12:18:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL112/sub-BAYLORNL112_ses-20230701_ecephys+ogen.nwb')
2024-02-13T12:18:50-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230701_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:20:32-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230701_ecephys+ogen.nwb: Completing upload

FILE <BAYLORNL112_20230702_allData.nwb>
2024-02-13T12:20:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Calling memoized version of <function get_digest at 0x000001EF73F93430> for C:\Data\NWB conversion\dataset_ALM_mini_module_NP2\out\BAYLORNL112_20230702_allData.nwb

FILE <sub-BAYLORNL112_ses-20230702_ecephys+ogen.nwb>
2024-02-13T12:20:38-0600 [DEBUG   ] fscacher.cache 17620:2924 Running original <function get_digest at 0x000001EF73F93430> on WindowsPath('C:/Data/NWB conversion/dataset_ALM_mini_module_NP2/000895/sub-BAYLORNL112/sub-BAYLORNL112_ses-20230702_ecephys+ogen.nwb')
2024-02-13T12:20:47-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230702_ecephys+ogen.nwb: Beginning upload
2024-02-13T12:21:53-0600 [DEBUG   ] dandi 17620:2924 sub-BAYLORNL112/sub-BAYLORNL112_ses-20230702_ecephys+ogen.nwb: Completing upload

but in subsequent "successful" run there even no any /out/ path:

❯ grep out *-ok.log
❯ 
❯ grep allData *-ok.log
❯ 

What are those \out\ and allData suffixed paths and how they came about here outside of dandiset folder and not present in successful upload later? did you use organize with symlinks somehow may be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os-windows Issues which are specific to MS Windows OS
Projects
None yet
Development

No branches or pull requests

3 participants