Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using caliper above threaded invocation #1011

Open
gshipman opened this issue Mar 1, 2024 · 13 comments
Open

Using caliper above threaded invocation #1011

gshipman opened this issue Mar 1, 2024 · 13 comments
Assignees

Comments

@gshipman
Copy link
Collaborator

gshipman commented Mar 1, 2024

@jonahm-LANL and I discussed this this week. I'm looking for a way to mark caliper regions that are above threaded dispatch of tasks.
I'm currently directly instrumenting tasks within the benchmark, see: https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_package.cpp#L55

If there is a place in the task dispatch machinery where I can add a caliper begin/end region and perhaps capture the name / symbol of the function that will be dispatched, or some other application meaningful metadata to properly name the region that might be one approach. Of course I would be measuring the invocation start/finish which may differ from execution start/finish, depending on the way the machinery works.

Any pointers? Thx.

@Yurlungur Yurlungur self-assigned this Mar 1, 2024
@Yurlungur
Copy link
Collaborator

The thought I had when we were discussing earlier in the week was you could instrument the functions that are added to the task list like here:
https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_driver.cpp#L105

For exmaple, you could replace

    auto update = tl.AddTask(avg_data, UpdateIndependentData<MeshData<Real>>, mc0.get(),
                             mdudt.get(), beta * dt, mc1.get());

with

    auto update = tl.AddTask(avg_data, 
                              [=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) { 
                                  CALI_CXX_MARK_FUNCTION;
                                  PARTHENON_INSTRUMENT
                                  UpdateIndependentData(a, b, w, c);
                                  }, 
                              mc0.get(), mdudt.get(), beta * dt, mc1.get());

I'm not sure if what Calibper gets out of this will be useful... but you could also replace the anonymous function with a named one via a macro instead of a lambda, for example.

This would let you insert the instrumentation at the tasking level for any task you wanted to instrument.

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 1, 2024

Thx @Yurlungur I will give this a whirl. Since the task registration site is within the benchmark code I can use CALI_MARK_BEGIN/END as in:

auto update = tl.AddTask(avg_data, 
                              [=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) { 
                                  CALI_MARK_BEGIN("UpdateIndependentData"); 
                                  PARTHENON_INSTRUMENT
                                  UpdateIndependentData(a, b, w, c);
                                  CALI_MARK_END("UpdateIndependentData"); 
                                  }, 
                              mc0.get(), mdudt.get(), beta * dt, mc1.get());

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 1, 2024

@Yurlungur Take a look here: https://github.com/gshipman/parthenon/blob/2f42a75dd1bae6a92b53ee93d0931f18a3846ad2/benchmarks/burgers/burgers_driver.cpp#L100

I'm getting a build error:


In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:21:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.hpp:20:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/parthenon/driver.hpp:18:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/application_input.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/boundary_conditions.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/meshblock_data.hpp:24:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/sparse_pack_base.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/state_descriptor.hpp:29:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/metadata.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/prolong_restrict/prolong_restrict.hpp:31:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/comms/bvals_in_one.hpp:29:
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:382:18: error: cannot initialize return object of type 'TaskStatus' with an rvalue of type 'void'
          return func(std::forward<Args>(args)...);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:217:7: note: in instantiation of function template specialization 'parthenon::TaskList::AddUserTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
      AddUserTask(dep, std::forward<Args>(args)...);
      ^
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:207:12: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
    return AddTask(TaskQualifier::normal, dep, std::forward<Args>(args)...);
           ^
/usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:100:19: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
    auto flx = tl.AddTask(none,
                  ^
6 warnings and 1 error generated.

@Yurlungur
Copy link
Collaborator

oops my bad. Try this:

    auto flx = tl.AddTask(none,
			  [=](MeshData<Real> *a) {
			    CALI_MARK_BEGIN("CalculateFluxes");
			    auto status = burgers_package::CalculateFluxes(a);
  			    CALI_MARK_END("CalculateFluxes");
                            return status;
			  }, mc0.get());

@Yurlungur
Copy link
Collaborator

Tasks need to return a TaskStatus enum. Forgot about that.

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 5, 2024

@Yurlungur , Looks like adding the CALI_MARK_BEGIN/END within AddTask is still problematic, I think that is running in a separate thread dispatched in TaskRegion::Execute().. So I moved CALI_MARK_BEGIN/END to here: https://github.com/gshipman/parthenon/blob/418c5d0b9a683ac88ff609f7246654af4aa6285c/src/tasks/tasks.hpp#L416
This allows a successful run. Is my observation correct (that the task is dispatched within tasks.hpp on a separate thread)?
If so, I need a way at getting at an identified for the task, ideally a string that can be generated during task initialization. I'd need a little guidance on how to do that.

Thanks!

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 5, 2024

@Yurlungur I've done a bit more work, I've added text names to tasks, and can successfully "caliper" around task enqueue. See:
https://github.com/gshipman/parthenon/blob/4be8412180e21ee1e68a60700ebd3cf3f18cc179/src/tasks/tasks.hpp#L440
But this isn't what we want, and it might not be possible to get what I want, I need to capture the task "start" and "end" from the main thread. I'm thinking this is going to be difficult without having the main thread polling a semaphore for start/stop or something. Thoughts?

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 5, 2024

Adding @daboehme

@Yurlungur
Copy link
Collaborator

@gshipman Ah, yes @jdolence recently rewrote the tasking infrastructure to use threads... however I think during normal operation there's still actually only one thread, as the rest of Parthenon isn't yet thread safe. I think tasks are all executed in serial on a given MPI rank---with the note that Kokkos loops may be non-blocking if you're on an accelerator. @jdolence should confirm.

I don't think you want to wrap around task enqueue as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?

@Yurlungur
Copy link
Collaborator

I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?

@daboehme
Copy link

daboehme commented Mar 5, 2024

I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?

It's problematic if the code creates/destroys a new OS thread for each task. Caliper keeps some per-thread data around until program exit so it'll keep adding memory. However If the code runs without threading or with a fixed thread pool there shouldn't be a problem annotating at the task level.

@Yurlungur
Copy link
Collaborator

@jdolence should confirm but I'm pretty sure the thread pool is of fixed size.

@gshipman
Copy link
Collaborator Author

gshipman commented Mar 5, 2024

I don't think you want to wrap around task enqueue as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?

@Yurlungur correct, I wouldn't want to capture enqueue, that was just to make sure I could get something if I was in the parent thread and include the task name in the caliper.

@jdolence, I'm running on Crossroads / Roci, should I just disable threading entirely? I'm not explicitly requesting threading, here is what cmake finds:

-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
--     Device Parallel: NoTypeDefined
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
-- Found TPLLIBDL: /usr/include  
-- Using internal desul_atomics copy
-- Kokkos Devices: SERIAL, Kokkos Backends: SERIAL
-- Using Kokkos source from Parthenon submodule at /usr/projects/eap/users/gshipman/parthenon-cali/external/Kokkos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants