-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce edm::async(), and use it in CUDA and Alpaka modules #44901
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44901/40168
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@cmsbuild, @makortel, @Dr15Jones, @smuzaffar, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3cb6f/39230/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
FWCore/Concurrency/src/async.cc
Outdated
#include "FWCore/Concurrency/interface/async.h" | ||
|
||
namespace edm::impl { | ||
WaitingThread::WaitingThread() { thread_ = std::thread(&WaitingThread::threadLoop, this); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you set a meaningful thread name, so it's easier to identify this thread pool in a GDB trace ?
For example:
WaitingThread::WaitingThread() { thread_ = std::thread(&WaitingThread::threadLoop, this); } | |
WaitingThread::WaitingThread() { | |
thread_ = std::thread(&WaitingThread::threadLoop, this); | |
pthread_setname_np(thread_.native_handle(), "edm async pool"); | |
} |
Or even something more elaborate with a static constexpr
name, and a check that its length is 15 or less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I added the name. While verifying the behavior by running the unit test in gdb
I discovered many threads (~8) with the edm async pool
name. Further investigation showed that the test ended up creating two WaitingThread
s (which I can believe), but the use of global_control
to set allowed parallelism to 1 lead the call to onetbb::task_arena::enqueue()
in edm::WaitingTaskWithArenaHolder::doneWaiting()
to create a new TBB-controlled thread that inherited the edm async pool
name (and all subsequent TBB threads created by that thread inherited the name too).
When I set the allowed parallelism to 2 (which is now done in the test), I saw 2 threads with the edm async pool
name.
While nearly all production jobs are configured to use multiple threads, maybe it would be time to look more into trying to give names to the TBB threads etc. I'll open a separate issue on that. Here: #44912
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44901/40191
|
Pull request #44901 was updated. @smuzaffar, @Dr15Jones, @fwyzard, @cmsbuild, @makortel can you please check and sign again. |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3cb6f/39266/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
I've compared some different approaches using the current HLT menu and recent data.
|
+heterogeneous |
Notes from review discussion with @Dr15Jones
Comments beyond this PR
|
PR description:
This PR adds
edm::async()
facility described in #29188 . This PR also replaces the use ofcudaStreamAddCallback()
withedm::async()
accompanied withcudaEventSynchronize()
, and makes the CUDA/Alpaka events to be created withcudaEventBlockingSync
flag.Measurements that I showed in CHEP 2023 https://indico.jlab.org/event/459/contributions/11810/ suggested possible 1 % throughput improvement at the HLT (of that time, many things have changed since) over
cudaStreamAddCallback()
. Earlier studies done with a prototype in cms-patatrack/pixeltrack-standalone#321 that somehow the thread pool withcudaEventSynchronize()
used less CPU thancudaStreamAddCallback()
.During the CHEP study I also tested polling with
cudaEventQuery()
, but the "waiting thread pool" approach was more performant.Another benefit over
cudaStreamAddCallback()
is that that function "is slated for eventual deprecation and removal", and the "replacement"cudaLaunchHostFunc()
does not call the callback function in case of an error in the associated CUDA stream.Resolves #29188
Resolves cms-sw/framework-team#916
PR validation:
Unit tests in
FWCore/Concurrency
,HeterogeneousCore/Alpaka{Core,Test}
,HeterogeneousCore/CUDA{Utilities,Core,Test}
succeed.The deployment on CUDA and Alpaka modules still needs performance testing
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Possibly to be backported to 14_0_X.