Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain timesteps causes the model to crash with gfortran/openmpi on scu15 #850

Open
bena-nasa opened this issue Nov 2, 2023 · 1 comment
Assignees

Comments

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 2, 2023

While tracking this issue:
#847

I ran into something else that seems to warrant it's own issue.

When you choose the single moment physics (in this case at c90) the default tilmestep is 450 s, but if you choose 2 moment physics the default timestep is 1800 s.

I am finding that when you run the model using the same version specified in #847 with that longer timestep the model just crashes in the dynamics with a segmentation fault in the first timestep when using gfortran and openmpi on scu17. This happens with both the release and debug builds with gfortran and is independent of which microphysics you have chosen. Use the shorter timestep of 450 s and the models runs with gfortran. Unfortunately I'm getting not much useful traceback:

 TR::e90
 TR::Rn222
 TR::CH3I
 Real*8 Resource Parameter: PSDRY:98305.000000, (default value)
 Global Area=   510064471910262.25
[borga169:30008:0:30008] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
[borga169:30009:0:30009] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
[borga169:30006:0:30006] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
==== backtrace (tid:  30008) ====
 0  /usr/lib64/libucs.so.0(ucs_handle_error+0xe4) [0x2abcc7d51da4]
 1  /usr/lib64/libucs.so.0(+0x2210c) [0x2abcc7d5210c]
 2  /usr/lib64/libucs.so.0(+0x222c2) [0x2abcc7d522c2]
 3  /lib64/libpthread.so.0(+0x11ce0) [0x2abca1941ce0]
 4  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_pml_ob1.so(+0x1852c) [0x2abccc22452c]
 5  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_pml_ob1.so(+0x1ad2c) [0x2abccc226d2c]
 6  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x7f) [0x2abcc623396f]
 7  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_btl_vader.so(+0x4def) [0x2abcc6233def]
 8  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libopen-pal.so.40(opal_progress+0x2c) [0x2abcbba9b16c]
 9  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi.so.40(ompi_request_default_wait+0x45) [0x2abcbaa88c75]
10  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi.so.40(PMPI_Wait+0x52) [0x2abcbaacc2c2]
11  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi_mpifh.so.40(mpi_wait+0x31) [0x2abcba820a51]
12  /gpfsm/dswdev/bmauer/models/geosgcm_moistbug/GEOSgcm/install-debug-gfortran/bin/../lib/libfms_r8.so(__mpp_mod_MOD_mpp_sync_self+0x101a) [0x2abcb2ba709c]
13  /gpfsm/dswdev/bmauer/models/geosgcm_moistbug/GEOSgcm/install-debug-gfortran/bin/../lib/libfms_r8.so(__mpp_domains_mod_MOD_mpp_complete_group_update_r4+0x62bb) [0x2abcb2ea8ff0]
@bena-nasa bena-nasa changed the title Certain timesteps causes the model to crash Certain timesteps causes the model to crash with gfortran/openmpi on scu15 Nov 2, 2023
@mathomp4
Copy link
Member

mathomp4 commented Nov 2, 2023

Hmm. That shows a possible something.

Ben, if you have a chance, can you try a build with -DFV_PRECISION=R4. That would probably crush poor MOM6, but I wonder if having the double r4+r8 FMS is causing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants