2020.05.06 Meeting Notes

Agenda

Individual/group updates
Milestones
Coordinate system abstraction #150
Quick run-through of outstanding non-WIP pull request reviews
Discuss variable pack caching

Next Two Week Goals

All of these in service of hydro on a static grid, single GPU

Container iterator work will complete
Josh will take a crack at coordinate systems
Phil will use that as a baseline for the kokkos-ification of parthenon

Athena milestone(s)

Target: Magnetized AGN feedback on energy deposition in the ICM on GPUs

Requires:

AMR on Cartesian coordinates
- Cuda streams (Parthenon)
- Parallel execution of tasks (Parthenon)
- Flux correction (Parthenon)
Magnetic fields (application)
- Face fields (Parthenon)
- Edges (Parthenon)
AGN prescription (application)
- Feedback abstraction in Parthenon?
Support simulation restarts (Parthenon)
Adiabatic EOS (application)
Static grav. potential/acceleration field (application)
Cooling (application)
Passive scalars (application)

Steps towards first milestone

(all steps include performance tuning and profiling)

Hydro on static grid on a single GPU (using a single Meshblock covering the entire Mesh)
as above plus MPI (using multiple MeshBlocks for the Mesh and one MeshBlock per GPU and process, i.e., make boundary comm work with GPUs)
as above but with multiple MeshBlocks per GPU/process (i.e., make task management and overlapping Cuda streams work)
as above but with AMR
add MHD (effectively redo 1.-4. but with Faces and Edges)

LANL Milestones

Target: ICF problem with hydro, gray-rad diffusion, simple burn, on a 3D cartesian mesh with AMR.

Requires:

AMR on Cartesian coordinates
- Cuda streams (Parthenon)
- Parallel execution of tasks (Parthenon)
Multi-Material Hydro
- Sparse data (Parthenon)
Global Solve
- Some form of bulk-synchronous tasks (Parthenon)

Target: Integration with legacy code

Bulk Synchronous Task Integration
- Some form of bulk-synchronous tasks (Parthenon)

Notes

Athena Update

Started with Kokkos-ification. Some outstanding questions around coordinates. In principal, advection can be run on a uniform grid on GPUs with parthenon. Confident the code will be running on GPUs by the end of the week.

Forrest added some nested parallelism abstractions. Also looking into Container Iterators. He has some concerns about the performance, but he think it's fine to merge the change and another look can be made at performance later.

LANL Physics

Josh worked on the variable pack design. Wants to improve getting variables and fluxes. Still some work to be done - wants to use the same mechanism for combining variables for AMR. Also, can be used by boundary sharing.

Josh dropped the coordinates class and baked in cartesian coordinates. We'll discuss more later.

Josh is working on a document called "Application Design with Parthenon". Supposed to be a guide on the intended usage of parthenon.

Jonah working on caching for variable packs based on indication that amortizing pack creation improves performance.

LANL CS

Joshua Brown got regression testing merged into Parthenon based on the calculate PI example. Also added an option to disable building of examples, but got a lot of feedback on that so some more time has to be spent on that. Also trying to loop back to index space PR.

Jonas discovered nvcc wrapper bugs with latest CMake in combination with IBM XL. Andrew and Jonas have been working toward better support for LANL's Darwin Power9 nodes.

Milestone Discussion

We're largely in alignment on the core technical work that needs to be done. Primarily, it's around getting performance on GPUs.

We need a kernel where we pack all faces of all variables in a single kernel in order to drive down our kernel launch overhead. Boundary communications happen largely the same way. In the best case scenario, on Sierra, MPI will go GPU-to-GPU.

Phil points out that for AMR there were 56 individual buffers for a single boundary exchange. With all the launch overhead, you get something like double the execution time of the kernel runtimes themselves.

LANL needs some feature that allows us to operate on multiple blocks at once in a bulk-synchronous manner. The two main targets of this are global solves and integrating with legacy, bulk-synchronous codes. May be some prior art in Athena with multi-grid solver.

Concrete Tasks

First step, AMR on cartesian coordinates. Tasks:

Performance Tuning
Support for restart

Down the road, tasks:

Faces
Edges

Cartesian Coordinates Discussion

Current issue - need everything to be inline functions to run on device. Coordinates class has a bunch of virtual methods that adds a bunch of indirection to the coordinate function. Also adds a ton of scratch arrays to each mesh block, but they can't be used because there's a bunch of parallel operation on the GPUs. So scratch space needs to be allocated on a more thread-local basis.

Few different approaches:

??? missed this
Forget about scratch arrays - add generic functions that compute various coordinate information inline.
Assume constant cartesian coordinates and make application map to alternate coordinate systems.

Decided to go with generic, inline-routines for now, and we can under-the-hood change that to either computing on stored coordinate information, or switch to a fixed grid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly