-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using caliper above threaded invocation #1011
Comments
The thought I had when we were discussing earlier in the week was you could instrument the functions that are added to the task list like here: For exmaple, you could replace auto update = tl.AddTask(avg_data, UpdateIndependentData<MeshData<Real>>, mc0.get(),
mdudt.get(), beta * dt, mc1.get()); with auto update = tl.AddTask(avg_data,
[=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) {
CALI_CXX_MARK_FUNCTION;
PARTHENON_INSTRUMENT
UpdateIndependentData(a, b, w, c);
},
mc0.get(), mdudt.get(), beta * dt, mc1.get()); I'm not sure if what Calibper gets out of this will be useful... but you could also replace the anonymous function with a named one via a macro instead of a lambda, for example. This would let you insert the instrumentation at the tasking level for any task you wanted to instrument. |
Thx @Yurlungur I will give this a whirl. Since the task registration site is within the benchmark code I can use CALI_MARK_BEGIN/END as in: auto update = tl.AddTask(avg_data,
[=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) {
CALI_MARK_BEGIN("UpdateIndependentData");
PARTHENON_INSTRUMENT
UpdateIndependentData(a, b, w, c);
CALI_MARK_END("UpdateIndependentData");
},
mc0.get(), mdudt.get(), beta * dt, mc1.get()); |
@Yurlungur Take a look here: https://github.com/gshipman/parthenon/blob/2f42a75dd1bae6a92b53ee93d0931f18a3846ad2/benchmarks/burgers/burgers_driver.cpp#L100 I'm getting a build error:
|
oops my bad. Try this: auto flx = tl.AddTask(none,
[=](MeshData<Real> *a) {
CALI_MARK_BEGIN("CalculateFluxes");
auto status = burgers_package::CalculateFluxes(a);
CALI_MARK_END("CalculateFluxes");
return status;
}, mc0.get()); |
Tasks need to return a |
@Yurlungur , Looks like adding the CALI_MARK_BEGIN/END within AddTask is still problematic, I think that is running in a separate thread dispatched in TaskRegion::Execute().. So I moved CALI_MARK_BEGIN/END to here: https://github.com/gshipman/parthenon/blob/418c5d0b9a683ac88ff609f7246654af4aa6285c/src/tasks/tasks.hpp#L416 Thanks! |
@Yurlungur I've done a bit more work, I've added text names to tasks, and can successfully "caliper" around task enqueue. See: |
Adding @daboehme |
@gshipman Ah, yes @jdolence recently rewrote the tasking infrastructure to use threads... however I think during normal operation there's still actually only one thread, as the rest of Parthenon isn't yet thread safe. I think tasks are all executed in serial on a given MPI rank---with the note that Kokkos loops may be non-blocking if you're on an accelerator. @jdolence should confirm. I don't think you want to wrap around task |
I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled? |
It's problematic if the code creates/destroys a new OS thread for each task. Caliper keeps some per-thread data around until program exit so it'll keep adding memory. However If the code runs without threading or with a fixed thread pool there shouldn't be a problem annotating at the task level. |
@jdolence should confirm but I'm pretty sure the thread pool is of fixed size. |
@Yurlungur correct, I wouldn't want to capture enqueue, that was just to make sure I could get something if I was in the parent thread and include the task name in the caliper. @jdolence, I'm running on Crossroads / Roci, should I just disable threading entirely? I'm not explicitly requesting threading, here is what cmake finds:
|
@jonahm-LANL and I discussed this this week. I'm looking for a way to mark caliper regions that are above threaded dispatch of tasks.
I'm currently directly instrumenting tasks within the benchmark, see: https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_package.cpp#L55
If there is a place in the task dispatch machinery where I can add a caliper begin/end region and perhaps capture the name / symbol of the function that will be dispatched, or some other application meaningful metadata to properly name the region that might be one approach. Of course I would be measuring the invocation start/finish which may differ from execution start/finish, depending on the way the machinery works.
Any pointers? Thx.
The text was updated successfully, but these errors were encountered: