Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add OpenPMD support #1050

Open
wants to merge 47 commits into
base: develop
Choose a base branch
from
Open

WIP: Add OpenPMD support #1050

wants to merge 47 commits into from

Conversation

pgrete
Copy link
Collaborator

@pgrete pgrete commented Apr 12, 2024

PR Summary

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

Comment on lines 272 to 275
for (auto &pmb : pm->block_list) {
// TODO(pgrete) check if we should skip the suffix for level 0
const auto level = pmb->loc.level() - pm->GetRootLevel();
const std::string &mesh_record_name = var_name + "_lvl" + std::to_string(level);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing level

Comment on lines 120 to 125
for (auto &pmb : pm->block_list) {
// TODO(pgrete) check if we should skip the suffix for level 0
// TODO(pgrete) ask LR why this is not mirrored from writing
// const auto level = pmb->loc.level() - pm->GetRootLevel();
const auto level = pmb->loc.level();
const std::string &mesh_record_name = var_name + "_lvl" + std::to_string(level);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading level

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many comments below. However, because this was such a big lift, I am of the opinion that we should merge even a not feature complete or not perfect contribution so that it doesn't start to drift away from develop. And the full feature complete capability can be created via later PRs. So I'm approving.

fill_derived = false # whether to fill one-copy test vars

<parthenon/output1>
file_type = openpmd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does openpmd always output everything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently mimics restart output (with support for additional explicit variables), but this doesn't need to remain that way.

CMakeLists.txt Outdated Show resolved Hide resolved
@@ -118,6 +118,12 @@ class Params {
return it->second;
}

const Mutability &GetMutability(const std::string &key) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +360 to +361
// TODO(reviewers) Why was the loop originally there? Does the direct Get causes
// issue?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure---it may have just been crazy code that no one tried to change.

@@ -241,6 +243,45 @@ std::vector<int> ComputeIDsAndFlags(Mesh *pm) {
});
}

template <typename T>
std::vector<T> FlattendedLocalToGlobal(Mesh *pm, const std::vector<T> &data_local) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this function does. Is it actually doing an MPI all-to-all to build up a global data vector? Is this something we ever want to do?

src/outputs/parthenon_opmd.cpp Outdated Show resolved Hide resolved
it.setAttribute("blub", host_vec);

for (const auto &[key, pkg] : pm->packages.AllPackages()) {
// WriteAllParams<bool>(pkg, &it); // check why this (vector of bool) doesn't work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just std::vector<bool> that doesn't work? That's probably because std::vector<bool> has a particular template specialization in the stl that is weirdly incompatible with generic code. Probably need to write a pointer-based array of bool instead.

Comment on lines +316 to +318
// TODO(pgrete) check var name standard compatiblity
// e.g., description: names of records and their components are only allowed to contain
// the characters a-Z, the numbers 0-9 and the underscore _
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken by PR #1043

Comment on lines +344 to +348
// TODO(pgrete) adjust for single prec output
// openPMD::Datatype dtype = openPMD::determineDatatype<Real>();
using OutT =
Real; // typename std::conditional<WRITE_SINGLE_PRECISION, float, Real>::type;
std::vector<OutT> tmp_data(var_size_max * num_blocks_local);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

Comment on lines +474 to +489
for (int t = 0; t < Nt; ++t) {
for (int u = 0; u < Nu; ++u) {
for (int v = 0; v < Nv; ++v) {
const auto [record_name, comp_name] =
OpenPMDUtils::GetMeshRecordAndComponentNames(vinfo, comp_idx, level);
auto mesh_comp = it.meshes[record_name][comp_name];

const auto comp_offset = tmp_offset;
for (int k = kb.s; k <= kb.e; ++k) {
for (int j = jb.s; j <= jb.e; ++j) {
for (int i = ib.s; i <= ib.e; ++i) {
tmp_data[tmp_offset] = static_cast<OutT>(out_var_h(t, u, v, k, j, i));
tmp_offset++;
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should be possible to merge with the higher-order functions in utils

# Given that the shapes are guaranteed to match (follow the check above)
# we can load chunks from both files.
# Note that we have to go over chunks as data might be sparse on disk so
# loading the entire record will contain gargabe in sparse places.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gargabe -> garbage

@BenWibking
Copy link
Collaborator

BenWibking commented May 17, 2024

In terms of packaging the OpenPMD dependency, is the user expected to install OpenPMD themselves to get the Python package?

If I don't have OpenPMD installed externally, it uses FetchContent to get the dependencies needed for the C++ build, but it complains that the Python module is not installed:

build git:(pgrete/pmd-output) cmake .. -DPARTHENON_ENABLE_ASCENT=ON -DAscent_DIR=$ASCENT_HOME/install/ascent-develop/lib/cmake/ascent -DCMAKE_BUILD_TYPE=Release -DPython3_FIND_STRATEGY=LOCATION -DPARTHENON_ENABLE_OPENPMD=ON -DPARTHENON_ENABLE_HDF5=OFF

-- Using the single-header code from /Users/benwibking/parthenon/build/_deps/openpmd-src/share/openPMD/thirdParty/json/single_include/
-- nlohmann-json: Using INTERNAL version '3.9.1'
-- toml11: Using INTERNAL version '3.7.1'
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
CMake Warning at /opt/homebrew/Cellar/cmake/3.29.3/share/cmake/Modules/FindHDF5.cmake:761 (message):
  HDF5 found for language C is not parallel but previously found language is
  parallel.
Call Stack (most recent call first):
  build/_deps/openpmd-src/CMakeLists.txt:320 (find_package)
-- Found MPI_C: /opt/homebrew/Cellar/open-mpi/5.0.3_1/lib/libmpi.dylib (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found ADIOS2: /opt/homebrew/lib/cmake/adios2/adios2-config.cmake (found suitable version "2.10.0", minimum required is "2.7.0") found components: C CXX MPI

openPMD build configuration:
  library Version: 0.15.2
  openPMD Standard: 1.1.0
  C++ Compiler: AppleClang 15.0.0.15000100
    /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++

  Installation: OFF

  Build Type: Release
  Library: static
  CLI Tools: OFF
  Examples: OFF
  Testing: OFF
  Invasive Tests: OFF
  Internal VERIFY: ON
  Build Options:
    MPI: ON
    HDF5: ON
    ADIOS1: OFF
    ADIOS2: ON
    PYTHON: OFF
    CUDA_EXAMPLES: OFF

-- Setting default Kokkos CXX standard to 17
-- The project name is: Kokkos
-- Using internal gtest for testing
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
--     Device Parallel: NoTypeDefined
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
--
-- Architectures:
-- Using internal desul_atomics copy
-- Kokkos Devices: SERIAL, Kokkos Backends: SERIAL
-- Using Kokkos source from Parthenon submodule at /Users/benwibking/parthenon/external/Kokkos
-- Gold standard is up-to-date (version 22). No download required.
-- Building unit tests.
-- Building integration tests.
-- Building performance tests.
-- Building regression tests.
CMake Error at cmake/PythonModuleCheck.cmake:44 (message):
  Required python module(s) openpmd_api not found.
Call Stack (most recent call first):
  cmake/TestSetup.cmake:32 (python_modules_found)
  tst/regression/CMakeLists.txt:158 (include)

If I do have openpmd-api installed externally, it doesn't complain, but it still uses the built-in openPMD module for the C++ build, which (I think) could lead to an inconsistency.

@BenWibking
Copy link
Collaborator

Also, we now have two very confusingly named CMake options for I/O: PARTHENON_ENABLE_OPENPMD and PARTHENON_DISABLE_HDF5. I think they should both be PARTHENON_ENABLE_ for consistency.

@BenWibking
Copy link
Collaborator

BenWibking commented May 17, 2024

I am getting a CMake configuration error when trying this:

CMake Error at build/_deps/openpmd-src/CMakeLists.txt:809 (target_link_libraries):
  Target "openPMD" links to:

    adios2::cxx11_mpi

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

-- Generating done (0.1s)
CMake Generate step failed.  Build files cannot be regenerated correctly.

Is this some weird macOS issue? Or maybe I'm using a very new CMake version?

Edit: ok, it turns out this is a weird Conda environment issue. You have to explicitly request MPI support when installing openpmd-api from Conda:

$ conda install -c conda-forge "openpmd-api=*=mpi_openmpi*"

TL;DR: I think this is a good reason to ensure that the internal OpenPMD build also installs the Python module.

@BenWibking
Copy link
Collaborator

I am getting regression test failures on my Mac laptop:

The following tests FAILED:
	 53 - regression_test:advection_performance (Failed)
	 54 - regression_mpi_test:advection_performance (Failed)
	 55 - regression_test:particle_leapfrog (Failed)
	 56 - regression_mpi_test:particle_leapfrog (Failed)
	 57 - regression_test:particle_leapfrog_outflow (Failed)
	 58 - regression_mpi_test:particle_leapfrog_outflow (Failed)
	 59 - regression_test:restart_opmd (Failed)
	 60 - regression_mpi_test:restart_opmd (Failed)

They all have mysterious errors like this:

Command to execute driver
/Users/benwibking/parthenon/build/example/advection/advection-example -i /Users/benwibking/parthenon/tst/regression/test_suites/restart_opmd/parthinput.restart parthenon/job/problem_id=gold

*****************************************************************
Subprocess error message
*****************************************************************

b'[Bens-MacBook-Pro:01358] *** Process received signal ***
[Bens-MacBook-Pro:01358] Signal: Abort trap: 6 (6)
[Bens-MacBook-Pro:01358] Signal code:  (0)
[Bens-MacBook-Pro:01358] [ 0] 0   libsystem_platform.dylib            0x000000019a9b3584 _sigtramp + 56
[Bens-MacBook-Pro:01358] [ 1] 0   libsystem_pthread.dylib             0x000000019a982c20 pthread_kill + 288
[Bens-MacBook-Pro:01358] [ 2] 0   libsystem_c.dylib                   0x000000019a88fa30 abort + 180
[Bens-MacBook-Pro:01358] [ 3] 0   libsystem_malloc.dylib              0x000000019a79fdc4 malloc_vreport + 896
[Bens-MacBook-Pro:01358] [ 4] 0   libsystem_malloc.dylib              0x000000019a7a3430 malloc_report + 64
[Bens-MacBook-Pro:01358] [ 5] 0   libsystem_malloc.dylib              0x000000019a7bd494 find_zone_and_free + 528
[Bens-MacBook-Pro:01358] [ 6] 0   libopen-rte.40.dylib                0x0000000107e5ffd0 orte_init + 436
[Bens-MacBook-Pro:01358] [ 7] 0   libmpi.40.dylib                     0x0000000106041c38 ompi_mpi_init + 928
[Bens-MacBook-Pro:01358] [ 8] 0   libmpi.40.dylib                     0x0000000105fb0c60 MPI_Init + 136
[Bens-MacBook-Pro:01358] [ 9] 0   advection-example                   0x0000000104ab2020 _ZN9parthenon16ParthenonManager16ParthenonInitEnvEiPPc + 68
[Bens-MacBook-Pro:01358] [10] 0   advection-example                   0x000000010491a488 main + 500
[Bens-MacBook-Pro:01358] [11] 0   dyld                                0x000000019a5fa0e0 start + 2360
[Bens-MacBook-Pro:01358] *** End of error message ***
'

*****************************************************************
Error detected while running subprocess command
*****************************************************************

Traceback (most recent call last):
  File "/Users/benwibking/parthenon/tst/regression/utils/test_case.py", line 245, in Run
    proc = subprocess.run(run_command, check=True, stdout=PIPE, stderr=STDOUT)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/Users/benwibking/parthenon/build/example/advection/advection-example', '-i', '/Users/benwibking/parthenon/tst/regression/test_suites/restart_opmd/parthinput.restart', 'parthenon/job/problem_id=gold']' died with <Signals.SIGABRT: 6>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/benwibking/parthenon/tst/regression/run_test.py", line 203, in <module>
    main(**vars(args))
  File "/Users/benwibking/parthenon/tst/regression/run_test.py", line 85, in main
    test_manager.Run()
  File "/Users/benwibking/parthenon/tst/regression/utils/test_case.py", line 256, in Run
    raise TestManagerError(
utils.test_case.TestManagerError:
Return code -6 from command '/Users/benwibking/parthenon/build/example/advection/advection-example -i /Users/benwibking/parthenon/tst/regression/test_suites/restart_opmd/parthinput.restart parthenon/job/problem_id=gold'

    Start 60: regression_mpi_test:restart_opmd
8/8 Test #60: regression_mpi_test:restart_opmd ................***Failed    2.22 sec

@BenWibking
Copy link
Collaborator

I propose installing the OpenPMD python package from the submodule using, e.g.:

find_package(Python COMPONENTS Interpreter REQUIRED)

if(DEFINED ENV{VIRTUAL_ENV} OR DEFINED ENV{CONDA_PREFIX})
  set(_pip_args)
else()
  set(_pip_args "--user")
endif()

execute_process(COMMAND ${Python_EXECUTABLE} -m pip install ${_pip_args} -e ${CMAKE_CURRENT_LIST_DIR})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants