Skip to content

2023.12.14 Meeting Notes

Philipp Grete edited this page Jan 10, 2024 · 3 revisions

Agenda

  • Individual/group updates
  • IO
  • review non-WIP PRs

Individual/group updates

LR

  • working on performance improvements for MG
  • caching PackDescriptors for boundary packs significantly improved performance (this was a surprising bottleneck)
    • for downstream codes: make sure they are static (especially for larger numbers of variables)
    • BP: might this be related to FlagCollections (which seem to be slow)

BP

  • chasing Gremlins in MPI like
    • fewer than 3 meshblocks in any direction results in issues (like magnetic field divergence)
    • not 1:1 reproducible
  • trying to bisect
  • sounds like being related to the issues PG is seeing

JM

  • small improvements here and there like
  • index splits (for easier/faster hierarchical parallelism by vectorization), PR is open
    • advantage over TeamMDRange is more flexible control over which indices are fused
  • added machinery for correctness check in parthenon-vibe benchmark
  • non-cell centered IO
  • refactoring Phoebus for more modern Parthenon use

PD

  • working on new Parthenon based code with curvilinear coordinates
    • more Athena++-esque
    • user prolongation/restriction/flux correction etc worked well
  • has a use case for "just a flux"
    • face fields provide too much boilerplate (and implicitly enable additional machinery)
    • might be fixed with a small new metadata flag
    • JM/LR: might be worth to refactor the flux correction machinery down the line (e.g., make fluxes face variables with dependencies)
    • looks like our implicit use of ghost cells for flux fields is not necessarily intuitive
      • needs to be documented
      • there might be a gotcha when using face and edge centered data in the same loop as cell centered field in the current WIP IndexSplit machinery

PG

  • encountered MPI issues (timeouts) with few blocks per rank (say one or two) with recent version of Parthenon
    • is somewhat reliably reproducible
    • unclear where this stems from, but not more detailed debugging yet
  • still tracing IO issues with HDF5, which now became a roadblock for simulations coming up
    • ADIOS2 seems to be very performant (on Orion). Quick testing allowed to write 4.5T of data in 0.85 seconds (from 512 nodes)
    • will look into new output based on openPMD with ADIOS2 backend
  • worked on large scale viz of INCITE sims
    • needed some custom xdmf pre-processing to reduce data that could be handled by ParaView

IO

  • see above (openPMD/ADIOS2)
  • we should ensure we can ship it as submodule (for ease of use)
  • need also to ensure that analysis pipelines (especially python based) easily interface with those outputs

Next meeting 4 Jan

-> people should think about ideas/approaches for a Gordon Bell submission

Idea collection for next developer meeting

  • unify/pick packing machinery <-> id based packing
  • best practices on performance relevant parameters (and more generally block sizes/work per device)
Clone this wiki locally