Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file I/O race conditions when running ctest in parallel #110

Open
bcdarwin opened this issue Jul 22, 2020 · 8 comments
Open

file I/O race conditions when running ctest in parallel #110

bcdarwin opened this issue Jul 22, 2020 · 8 comments

Comments

@bcdarwin
Copy link

Possibly not too severe ... ?

31/52 Test #36: minc2-large-attribute-100k .......***Failed    0.00 sec
/build/source/libsrc2/volume.c:236 (from MINC): Unable to create file '3D_image_a.mnc'
Error reported on line #113, create_3D_image: 0
1 error reported
Creating 3D image with attribute 100000 ! (3D_image_a.mnc)

      Start 39: minc2-dimension-test
32/52 Test #37: minc2-large-attribute-1m .........***Failed    0.00 sec
/build/source/libsrc2/volume.c:236 (from MINC): Unable to create file '3D_image_a.mnc'
Error reported on line #113, create_3D_image: 0
1 error reported
Creating 3D image with attribute 1000000 ! (3D_image_a.mnc)

@gdevenyi
Copy link
Contributor

"make test" in libminc with the develop-1.9.18 (HDF5 1.10.6) superbuild passes all tests, can you please compare your HDF5 build config to: https://github.com/BIC-MNI/minc-toolkit-v2/blob/develop-1.9.18/cmake-modules/BuildHDF5.cmake

Thanks.

@bcdarwin
Copy link
Author

Looking at libhdf5.settings, a main difference seems to be use of -O3 but I haven't verified this yet ...

@gdevenyi
Copy link
Contributor

Any updates on this? My autobuild dockers are having issues with a couple of the HDF5 runs, I'm wondering if its related:
BIC-MNI/build_packages#14

@bcdarwin
Copy link
Author

bcdarwin commented Oct 5, 2020

Now instead failing as follows after disabling parallel building and bumping some dependencies (possibly more evidence this is a race condition or memory corruption):

37/50 Test #46: minc2-valid-test .................***Failed    0.02 sec
/build/source/libsrc2/volume.c:1399 (from MINC): Unable to open file '/build/source/build/testdir/3D_minc2.mnc'
Error reported on line #20, can't open input: -1
/build/source/libsrc2/volume.c:1399 (from MINC): Unable to open file '/build/source/build/testdir/3D_minc2_int.mnc'
Error reported on line #20, can't open input: -1
min -32768.000000 max 32767.000000
min -340282346638528859811704183484516925440.000000 max 340282346638528859811704183484516925440.000000
min 0.000000 max 255.000000
38/50 Test #45: minc2-slice-test .................   Passed    0.02 sec

@gdevenyi
Copy link
Contributor

gdevenyi commented Oct 6, 2020

Interesting, any chance you could throw together a reproducer in a Docker container or such so we can play with it?

Does it error if you run the tests a second time? Maybe a strace might helpful as well.

@vfonov
Copy link
Member

vfonov commented Oct 6, 2020

there are two tests in CMakeLists.txt that use files with the same names:

add_minc_test(minc2-slice-test            minc2-slice-test 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_int.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_float.mnc
                                          )
                                          

add_minc_test(minc2-valid-test            minc2-valid-test
                                          ${CMAKE_CURRENT_BINARY_DIR}/2D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_int.mnc 
                                          ${CMAKE_CURRENT_BINARY_DIR}/3D_minc2_float.mnc
                                          ${CMAKE_CURRENT_BINARY_DIR}/4D_minc2.mnc

So, if tests are executed in parallel, there will be a conflict

@vfonov
Copy link
Member

vfonov commented Oct 6, 2020

Also,


add_minc_test(minc2-large-attribute-10k   minc2-large-attribute 10000)
add_minc_test(minc2-large-attribute-100k  minc2-large-attribute 100000)
add_minc_test(minc2-large-attribute-1m    minc2-large-attribute 1000000)
  • each of these will create a file with the same name 3D_image_a.mnc

@bcdarwin bcdarwin changed the title tests minc2-large-attribute-100k and minc2-large-attribute-1m fail with HDF5 1.10.x but not 1.8.x file I/O race conditions when running ctest in parallel Oct 6, 2020
@bcdarwin
Copy link
Author

bcdarwin commented Oct 6, 2020

Thanks Vlad! Looks like running the tests sequentially fixes things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants