Skip to content

2020.07.29 Meeting Notes

Andrew Gaspar edited this page Jul 29, 2020 · 5 revisions

Agenda

  • Individual/group updates

  • New name for "master" branch (Philipp)

  • Versioning scheme (Philipp)

  • Meeting time (Galen/Philipp)

  • Hands on performance analysis with NVidia (Galen)

  • Long term performance regression/archive (Philipp)

  • Quick run-through of outstanding non-WIP pull request reviews

Group Updates

LANL CS

Met with Max Katz from Nvidia on Monday to briefly discuss some Parthenon profiling work. Got some good first-blush feedback from him, shared issue https://github.com/lanl/parthenon/issues/221

We would like to meet with him next week.

TODO: @pgrete send me spectrum-mpi variable to allow serial run of MPI built executables.

CI on Darwin is now running - work needs to be done to review this code. Also looking into some issues with submodules

LANL Physics

Jonah has been investigating AMR.

Tasks can now use non-static member functions as tasks.

Joshua has been looking into how to implement operator splits in a task list.

Sriram has started working on a restart capability using HDF5. Branch: feature/restart

Galen notes that it in the future we should investigate an M:N mapping of ranks to hdf5 dump files. Blocks allow us to be more granular than per-rank.

AthenaPK

Jim is looking into strategies for reducing memory footprint. Flux arrays may be turned off for some variables.

Phil is debugging the thread+stream PR. Scratch pad array allocation is busted for multi-threaded codes. Christian Trott is investigating.

Performance Regression tests now working.

Gold standard files are now checked in as GitHub releases so we can all update them. They're automatically downloaded and extracted. Works just fine if you copy archive over manually for machines without online access.

Forrest has been working on scaling for buffer packing routines. Promising start for small mesh block sizes.

Discussion

master branch + Versioning Scheme

Will rename "master" branch to "stable".

TODO @agaspar: Will help @pgrete close https://github.com/lanl/parthenon/pull/226

Meeting With Nvidia Time

Aim for 9 AM on Monday, otherwise Friday.

Nvidia Performance Analysis

Target problems:

  • Performance Regression Test - No MPI, uniform grid, very basic
  • Kokkos implementation of Buffer Packing routine Target platform: RZAnsel Max Katz has pretty good domain knowledge on this problem.

Long term performance regression/archive

Phil found a tool online that lets you point at a repo that holds the performance information over time. @pgrete will send out a link to this.

We're in agreement that we want:

  • A dashboard of some sort showing performance over time
  • Performance analysis on pull requests
  • Power9 + Volta100 run
  • Skylake + AVX512 run

Public API

There's general agreement that we'd like to be more explicit about our public API vs. private API. However, it's not fully clear the scope of this.

It may be difficult/time-consuming to disentangle the public APIs from private APIs now, but we could do better as a forward looking matter.

Discussion will continue here: https://github.com/lanl/parthenon/issues/239

Clone this wiki locally