Using caliper above threaded invocation #1011

gshipman · 2024-03-01T00:59:25Z

@jonahm-LANL and I discussed this this week. I'm looking for a way to mark caliper regions that are above threaded dispatch of tasks.
I'm currently directly instrumenting tasks within the benchmark, see: https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_package.cpp#L55

If there is a place in the task dispatch machinery where I can add a caliper begin/end region and perhaps capture the name / symbol of the function that will be dispatched, or some other application meaningful metadata to properly name the region that might be one approach. Of course I would be measuring the invocation start/finish which may differ from execution start/finish, depending on the way the machinery works.

Any pointers? Thx.

Yurlungur · 2024-03-01T01:13:51Z

The thought I had when we were discussing earlier in the week was you could instrument the functions that are added to the task list like here:
https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_driver.cpp#L105

For exmaple, you could replace

    auto update = tl.AddTask(avg_data, UpdateIndependentData<MeshData<Real>>, mc0.get(),
                             mdudt.get(), beta * dt, mc1.get());

with

    auto update = tl.AddTask(avg_data, 
                              [=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) { 
                                  CALI_CXX_MARK_FUNCTION;
                                  PARTHENON_INSTRUMENT
                                  UpdateIndependentData(a, b, w, c);
                                  }, 
                              mc0.get(), mdudt.get(), beta * dt, mc1.get());

I'm not sure if what Calibper gets out of this will be useful... but you could also replace the anonymous function with a named one via a macro instead of a lambda, for example.

This would let you insert the instrumentation at the tasking level for any task you wanted to instrument.

gshipman · 2024-03-01T04:26:36Z

Thx @Yurlungur I will give this a whirl. Since the task registration site is within the benchmark code I can use CALI_MARK_BEGIN/END as in:

auto update = tl.AddTask(avg_data, 
                              [=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) { 
                                  CALI_MARK_BEGIN("UpdateIndependentData"); 
                                  PARTHENON_INSTRUMENT
                                  UpdateIndependentData(a, b, w, c);
                                  CALI_MARK_END("UpdateIndependentData"); 
                                  }, 
                              mc0.get(), mdudt.get(), beta * dt, mc1.get());

gshipman · 2024-03-01T23:04:45Z

@Yurlungur Take a look here: https://github.com/gshipman/parthenon/blob/2f42a75dd1bae6a92b53ee93d0931f18a3846ad2/benchmarks/burgers/burgers_driver.cpp#L100

I'm getting a build error:


In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:21:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.hpp:20:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/parthenon/driver.hpp:18:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/application_input.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/boundary_conditions.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/meshblock_data.hpp:24:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/sparse_pack_base.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/state_descriptor.hpp:29:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/metadata.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/prolong_restrict/prolong_restrict.hpp:31:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/comms/bvals_in_one.hpp:29:
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:382:18: error: cannot initialize return object of type 'TaskStatus' with an rvalue of type 'void'
          return func(std::forward<Args>(args)...);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:217:7: note: in instantiation of function template specialization 'parthenon::TaskList::AddUserTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
      AddUserTask(dep, std::forward<Args>(args)...);
      ^
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:207:12: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
    return AddTask(TaskQualifier::normal, dep, std::forward<Args>(args)...);
           ^
/usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:100:19: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
    auto flx = tl.AddTask(none,
                  ^
6 warnings and 1 error generated.

Yurlungur · 2024-03-01T23:11:33Z

oops my bad. Try this:

    auto flx = tl.AddTask(none,
			  [=](MeshData<Real> *a) {
			    CALI_MARK_BEGIN("CalculateFluxes");
			    auto status = burgers_package::CalculateFluxes(a);
  			    CALI_MARK_END("CalculateFluxes");
                            return status;
			  }, mc0.get());

Yurlungur · 2024-03-01T23:11:48Z

Tasks need to return a TaskStatus enum. Forgot about that.

gshipman · 2024-03-05T00:34:01Z

@Yurlungur , Looks like adding the CALI_MARK_BEGIN/END within AddTask is still problematic, I think that is running in a separate thread dispatched in TaskRegion::Execute().. So I moved CALI_MARK_BEGIN/END to here: https://github.com/gshipman/parthenon/blob/418c5d0b9a683ac88ff609f7246654af4aa6285c/src/tasks/tasks.hpp#L416
This allows a successful run. Is my observation correct (that the task is dispatched within tasks.hpp on a separate thread)?
If so, I need a way at getting at an identified for the task, ideally a string that can be generated during task initialization. I'd need a little guidance on how to do that.

Thanks!

gshipman · 2024-03-05T16:49:40Z

@Yurlungur I've done a bit more work, I've added text names to tasks, and can successfully "caliper" around task enqueue. See:
https://github.com/gshipman/parthenon/blob/4be8412180e21ee1e68a60700ebd3cf3f18cc179/src/tasks/tasks.hpp#L440
But this isn't what we want, and it might not be possible to get what I want, I need to capture the task "start" and "end" from the main thread. I'm thinking this is going to be difficult without having the main thread polling a semaphore for start/stop or something. Thoughts?

gshipman · 2024-03-05T16:50:16Z

Adding @daboehme

Yurlungur · 2024-03-05T19:23:39Z

@gshipman Ah, yes @jdolence recently rewrote the tasking infrastructure to use threads... however I think during normal operation there's still actually only one thread, as the rest of Parthenon isn't yet thread safe. I think tasks are all executed in serial on a given MPI rank---with the note that Kokkos loops may be non-blocking if you're on an accelerator. @jdolence should confirm.

I don't think you want to wrap around task enqueue as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?

Yurlungur · 2024-03-05T19:24:34Z

I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?

daboehme · 2024-03-05T19:41:28Z

I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?

It's problematic if the code creates/destroys a new OS thread for each task. Caliper keeps some per-thread data around until program exit so it'll keep adding memory. However If the code runs without threading or with a fixed thread pool there shouldn't be a problem annotating at the task level.

Yurlungur · 2024-03-05T19:42:23Z

@jdolence should confirm but I'm pretty sure the thread pool is of fixed size.

gshipman · 2024-03-05T20:17:42Z

I don't think you want to wrap around task enqueue as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?

@Yurlungur correct, I wouldn't want to capture enqueue, that was just to make sure I could get something if I was in the parent thread and include the task name in the caliper.

@jdolence, I'm running on Crossroads / Roci, should I just disable threading entirely? I'm not explicitly requesting threading, here is what cmake finds:

-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
--     Device Parallel: NoTypeDefined
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
-- Found TPLLIBDL: /usr/include  
-- Using internal desul_atomics copy
-- Kokkos Devices: SERIAL, Kokkos Backends: SERIAL
-- Using Kokkos source from Parthenon submodule at /usr/projects/eap/users/gshipman/parthenon-cali/external/Kokkos

Yurlungur self-assigned this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using caliper above threaded invocation #1011

Using caliper above threaded invocation #1011

gshipman commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

gshipman commented Mar 1, 2024

gshipman commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

gshipman commented Mar 5, 2024

gshipman commented Mar 5, 2024

gshipman commented Mar 5, 2024

Yurlungur commented Mar 5, 2024

Yurlungur commented Mar 5, 2024

daboehme commented Mar 5, 2024 •

edited

Yurlungur commented Mar 5, 2024

gshipman commented Mar 5, 2024

Using caliper above threaded invocation #1011

Using caliper above threaded invocation #1011

Comments

gshipman commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

gshipman commented Mar 1, 2024

gshipman commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

Yurlungur commented Mar 1, 2024

gshipman commented Mar 5, 2024

gshipman commented Mar 5, 2024

gshipman commented Mar 5, 2024

Yurlungur commented Mar 5, 2024

Yurlungur commented Mar 5, 2024

daboehme commented Mar 5, 2024 • edited

Yurlungur commented Mar 5, 2024

gshipman commented Mar 5, 2024

daboehme commented Mar 5, 2024 •

edited