Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some results of restarts are not bitwise exact #1072

Closed
lroberts36 opened this issue May 10, 2024 · 4 comments · Fixed by #1073
Closed

Some results of restarts are not bitwise exact #1072

lroberts36 opened this issue May 10, 2024 · 4 comments · Fixed by #1073
Assignees
Labels
bug Something isn't working

Comments

@lroberts36
Copy link
Collaborator

lroberts36 commented May 10, 2024

How I ended up here is a longer story, but today I was looking at restarting the advection example. On current develop in a build directory, if I run

rm *hdf*
rm parthinput*
cp ../tst/regression/test_suites/advection_outflow/parthinput.advection_outflow .
echo " " >> parthinput.advection_outflow
echo "<parthenon/output1>" >> parthinput.advection_outflow
echo "file_type = rst" >> parthinput.advection_outflow
echo "dt = 0.05" >> parthinput.advection_outflow
./example/advection/advection-example -i parthinput.advection_outflow parthenon/job/problem_id=gold
./example/advection/advection-example -r gold.out1.00009.rhdf parthenon/job/problem_id=silver
h5diff -c gold.out1.final.rhdf silver.out1.final.rhdf

the output I get from the h5diff is

attribute: <WallTime of </Info>> and <WallTime of </Info>>
1 differences found
attribute: <File of </Input>> and <File of </Input>>
4397 differences found
dataset: </advected> and </advected>
1550 differences found
dataset: </one_minus_sqrt_one_minus_advected_sq_12> and </one_minus_sqrt_one_minus_advected_sq_12>
1385 differences found
dataset: </one_minus_sqrt_one_minus_advected_sq_37> and </one_minus_sqrt_one_minus_advected_sq_37>
1385 differences found

If I switch to restarting from gold.out1.00005.rhdf I get

attribute: <WallTime of </Info>> and <WallTime of </Info>>
1 differences found
attribute: <File of </Input>> and <File of </Input>>
4396 differences found
dataset: </advected> and </advected>
3987 differences found
dataset: </one_minus_sqrt_one_minus_advected_sq_12> and </one_minus_sqrt_one_minus_advected_sq_12>
3663 differences found
dataset: </one_minus_sqrt_one_minus_advected_sq_37> and </one_minus_sqrt_one_minus_advected_sq_37>
3663 differences found

These diffs are much larger than machine precision (I have to take the tolerance of the h5diff to 1.e-5 before the dataset differences go away), but they are small compared to the magnitude of the solution. What is strange is that if I restart from gold.out1.0000[0-4].rhdf or gold.out1.0000[6-8].rhdf I get bitwise agreement between the gold and silver final outputs. This behavior seems to persist back to at least the most recent release. At least to me, this behavior seems to be a bug.

@lroberts36 lroberts36 added the bug Something isn't working label May 10, 2024
@lroberts36
Copy link
Collaborator Author

lroberts36 commented May 10, 2024

Ok, if I set parthenon/mesh/derefine_count=1 at least these two restarts give bitwise exact results. It looks like MeshRefinement::DereferenceCount() is never called during output, so this state must not get saved. (Also I think that function is inappropriately named, it really should be DerefinementCount).

@Yurlungur
Copy link
Collaborator

Ok so it sounds like the solution is that DereferenceCount(), which should really be DerefinementCount should be called on output. Maybe we pack it with the other mesh structure vars in Info?

@Yurlungur
Copy link
Collaborator

Also good catch!

@lroberts36
Copy link
Collaborator Author

I have a PR in progress that fixes this issue. I could either push that, or continue on down the path of also fixing a similar issue with sparse restarts (the deallocation counter is not saved there), but the sparse restart issue seems to be known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants