2023.07.13 Meeting Notes

Agenda

LR

AMR for non-cell-centered fields is ready for review (no IO yet) with 3 PRs
- Morton indexing
- Ownership model
- prolong/restrict in one with new generalized operators
Showed movies of OT and MHD rotor with AMR!!! (using potential based formulation)
Question for downstream codes: what to do about EMF correction when doing Athena++-style CT
- machinery in place, but not used/tested yet (ownership model applies, similar to cell centered flux correction)
- might need to separate out flux correction step from boundary comm
- other items to be discussed: what should be communicated/corrected (only fine/coarse, but not same-same?)

BP

chased down bug (in KHARMA) when creating new containers (should data be copied or not)
created PR for PEP1

JM

Riot is now on parthenon/develop
Riot now entirely based on sparse packs
- some quality of life improvements should be available upstream shortly
will also push custom load balancing upstream
Q: what about BiCGStab?
- may or may not be updated (currently lives on separate branch)
- LR more interested in pushing for Multigrid rather than
- BP has backport, will open PR

PM

also worked on riot <-> develop
bug reported last time (fine/coarse round of error when run on different ranks)
- problem gone away by changing new comm task
- not sure why it went away (maybe because of local/nonlocal versus any comm, but that doesn't explain round of error level)
will create PR to add CI machinery to cover multiple ranks and pack sizes

FG

fixing INCITE runs on Frontier
working with co-design summer school students (found issue with reflective bounds in phydro)
looking at sph/cyl coordinates, early PR expected next week
got AthenaPK compiled on Chicoma (also related to also https://github.com/lanl/phoebus/issues/70)

BW

debugging various Ascent issues
- slice perp to y axis when running on GPUs
- ghost zones
got couple of open WIP PRs

PG

still tracking down IO performance issues on Frontier
discovered that our chunking strategy is not optimal
working on a best-practice solution and looking for external input (from people with more expert knowledge)
Question on extra variables for rst outputs. No objection (though the parameter name should probably differ from the normal outputs)

AJ

Results from load balancing work over past months:
Can now assign arbitrary blocks to arb ranks
Test setup (30k timesteps, 16^3 blocks, 512 ranks, spherical blast with Phoebus, so work per block varies)
Implemented different load balancing and comm locality policies
- contiguous (good locality, poor load balance, given vary. work per block)
- longest processing time (good load balance, poor locality)
- contiguous-improved (dynamic programming to eval balance)
- contiguous-improved-iterative (iteratively improve on prev solution)
Currently, load imbalance per block is ~30-40%. Gains should be much higher with larger imbalance.
Will look at comparing to Riot LB (see above)
Other interesting outcomes
- (de)refinements oscillate (and are quite costly), so reducing the number of derefinements improves performance
- For given setup, compute load even per block evolves with time, which is not naturally captured by standard LB. Enforcing LB helps in reducing runtime.
next
- additional input decks
- GPU vs CPU