Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

kakwok · 2024-05-06T18:02:12Z

PR description:

This PR port the CUDA implementation of Hcal Local Reconstruction (Mahi) to using Alpaka.
Custom SoA data structure used in CUDA for HCAL condition data and rechits are replaced with PortableCollection.
The current Alpaka implementation aims at reproducing the results from the CUDA implementation, no algorithmic changes are made.

There are 4 main pieces involved in the migration:

digiConverter(hcalDigisProducerPortable): Convert CPU digis into SoA format
Produce HCAL condition data in SoA format (Multiple producers)
Mahi kernels (Mahi.dev.cc)
Convert rechits from SoA to legacy format (HcalRecHitSoAToLegacy)

More details on code design are presented in the recent HLT GPU development meetings:
https://indico.cern.ch/event/1350955/
https://indico.cern.ch/event/1350953/
https://indico.cern.ch/event/1350952/
https://indico.cern.ch/event/1230377/
https://indico.cern.ch/event/1230374/

PR validation:

Alpaka GPU and CPU rechit energies are validated with slightly modified workflow 12434.523 (ttbar simulation).
More validation comparisons are being worked on.

cmsbuild · 2024-05-06T18:02:42Z

cms-bot internal usage

cmsbuild · 2024-05-06T18:09:39Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40188

This PR adds an extra 248KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40188/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40188/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

cmsbuild · 2024-05-06T18:27:17Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40189

This PR adds an extra 248KB to repository

cmsbuild · 2024-05-06T18:27:45Z

A new Pull Request was created by @kakwok for master.

It involves the following packages:

CondFormats/DataRecord (db, alca)
CondFormats/HcalObjects (db, alca)
DQM/HcalTasks (dqm)
DataFormats/HcalDigi (simulation)
DataFormats/HcalRecHit (reconstruction)
EventFilter/HcalRawToDigi (reconstruction)
HeterogeneousCore/AlpakaInterface (heterogeneous)
RecoLocalCalo/Configuration (reconstruction)
RecoLocalCalo/HcalRecAlgos (reconstruction)
RecoLocalCalo/HcalRecProducers (reconstruction)

@syuvivida, @francescobrivio, @makortel, @rvenditti, @mandrenguyen, @fwyzard, @tjavaid, @mdhildreth, @saumyaphor4252, @perrotta, @consuegs, @jfernan2, @cmsbuild, @civanch, @nothingface0, @antoniovagnerini can you please review it and eventually sign? Thanks.
@argiro, @DryRun, @rsreds, @Martin-Grunewald, @PonIlya, @JanChyczynski, @rovere, @makortel, @missirol, @yuanchao, @tocheng, @bsunanda, @apsallid, @mmusich, @ReyerBand, @wang0jin, @sameasy, @thomreis, @mariadalfonso, @youyingli, @abdoulline, @rchatter, @seemasharmafnal this is something you requested to watch as well.
@sextonkennedy, @antoniovilela, @rappoccio you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2024-05-06T21:38:54Z

type hcal

CondFormats/HcalObjects/interface/HcalMahiConditionsHost.h

CondFormats/HcalObjects/interface/HcalMahiConditionsSoA.h

CondFormats/HcalObjects/interface/HcalMahiPulseOffsetsHost.h

fwyzard · 2024-05-24T22:07:15Z

RecoLocalCalo/HcalRecProducers/plugins/alpaka/Mahi.dev.cc

+                if (sampleWithinWindow == 0) {
+                  shrMethod0EnergyAccum[lch] = 0;
+                  //TODO: check this conversion
+                  shrMethod0EnergySamplePair[lch] = int(std::numeric_limits<float>::min());


std::numeric_limits<float>::min() is 1.17549e-38f.
(int) std::numeric_limits<float>::min() is 0.

Here is the original line from CUDA version

cmssw/RecoLocalCalo/HcalRecProducers/src/MahiGPU.cu

Line 165 in 1234e95

shrMethod0EnergySamplePair[lch] = __float_as_uint(std::numeric_limits<float>::min());

so also this line should use __float_as_uint / edm::bit_cast. Good catch!

Thanks for catching it. I've implemented it in dceaf88

fwyzard · 2024-05-24T22:09:26Z

RecoLocalCalo/HcalRecProducers/plugins/alpaka/Mahi.dev.cc

+          }      // loop for channgel groups
+        }
+      };  //Kernel_prep1d_sameNumberOfSamples
+      class Kernel_prep_pulseMatrices_sameNumberOfSamples {


Suggested change

class Kernel_prep_pulseMatrices_sameNumberOfSamples {

class Kernel_prep_pulseMatrices_sameNumberOfSamples {

fwyzard · 2024-05-24T22:43:34Z

After going again through the Kernel_prep1d_sameNumberOfSamples kernel, I'm now worried that the problem may be due to a missing or incorrect synchronisation among the threads between filling and using the shared memory.

This is actually more important on CPU, where the elements are processed sequentially by a singe thread.

fwyzard · 2024-05-24T22:46:25Z

For example, using

alpaka::syncBlockThreads(acc);

inside the double loop

//Loop over all groups of channels
for (auto group : uniform_groups_y(acc, nchannels)) {
    //Loop over each channel, first compute soiSamples for all channels
    for (auto channel : uniform_group_elements_y(acc, group, nchannels)) {

may or may not work for the CUDA backend, but it definitely does not work for the serial CPU backend.

The correct way should be to split the loop into N parts, and put the N-1 synchronisations between each pair of loops.

makortel · 2024-05-25T01:09:32Z

After going again through the Kernel_prep1d_sameNumberOfSamples kernel, I'm now worried that the problem may be due to a missing or incorrect synchronisation among the threads between filling and using the shared memory.

There are indeed some problems of that kind (and syncBlockThreads() at incorrect loop level) that @kakwok is working on to fix those.

cmsbuild · 2024-05-27T03:54:14Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40357

This PR adds an extra 36KB to repository
Found files with invalid states:
- EventFilter/HcalRawToDigi/plugins/alpaka/HcalDigisProducerPortable.cc:
  - Added: 79beb3c
  - Modified: d82fbab, 7151058
  - Deleted: 6ba6f24
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): Add run3 tracking low pu era #33532, add Era_Run3_2024, Era_Run3_2025, Tier0 reco scenario for 2024, and change default era in PyReleaseValidation #43761, Alpaka vs CUDA DQM compare modules for pixel tracks objects #43964, first release of TICLv5 #44785, Introducing SimOnGen and GenOnly wfs for 2024 and 2026D110 #45005

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40357/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40357/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

kakwok · 2024-05-27T03:55:46Z

I have added the float_as_uint fix and the attempt for fixing the issue with syncBlockThreads.
The updated code passes the racecheck of the compute-sanitizer
compute-sanitizer --target-processes all --require-cuda-init=no --tool=racecheck cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py

Unfortunately, the discrepancy between Alpaka GPU and legacy CPU remains.

But since I'll be on vacation this week, I would like to share the progress nonetheless.

I have not implemented the initialising of the shared memory.

cmsbuild · 2024-05-27T04:06:28Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40358

This PR adds an extra 36KB to repository
Found files with invalid states:
- EventFilter/HcalRawToDigi/plugins/alpaka/HcalDigisProducerPortable.cc:
  - Added: 79beb3c
  - Modified: d82fbab, 7151058
  - Deleted: 6ba6f24
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): Add run3 tracking low pu era #33532, add Era_Run3_2024, Era_Run3_2025, Tier0 reco scenario for 2024, and change default era in PyReleaseValidation #43761, Alpaka vs CUDA DQM compare modules for pixel tracks objects #43964, first release of TICLv5 #44785, Introducing SimOnGen and GenOnly wfs for 2024 and 2026D110 #45005

cmsbuild · 2024-05-27T04:06:56Z

Pull request #44910 was updated. @mdhildreth, @cmsbuild, @saumyaphor4252, @makortel, @sunilUIET, @subirsarkar, @srimanob, @AdrianoDee, @nothingface0, @francescobrivio, @rvenditti, @syuvivida, @consuegs, @tjavaid, @antoniovagnerini, @perrotta, @jfernan2, @fwyzard, @miquork, @mandrenguyen, @civanch can you please check and sign again.

fwyzard · 2024-05-27T16:01:43Z

With the latest changes, I don't get any errors from valgrind either.

mmusich · 2024-05-27T16:51:06Z

in relation to #44910 (comment), I just re-tested the code (as of kakwok@9ac4309) with:

#!/bin/bash -ex

scram p CMSSW_14_0_7_MULTIARCHS
cd  CMSSW_14_0_7_MULTIARCHS/src
eval `scramv1 runtime -sh`

git cms-addpkg CondFormats/DataRecord
git cms-addpkg CondFormats/HcalObjects 
git cms-addpkg DQM/HcalTasks 
git cms-addpkg DataFormats/HcalDigi
git cms-addpkg DataFormats/HcalRecHit 
git cms-addpkg EventFilter/HcalRawToDigi 
git cms-addpkg HeterogeneousCore/AlpakaInterface 
git cms-addpkg RecoLocalCalo/Configuration 
git cms-addpkg RecoLocalCalo/HcalRecAlgos 
git cms-addpkg RecoLocalCalo/HcalRecProducers
git cms-addpkg Configuration/PyReleaseValidation
wget https://patch-diff.githubusercontent.com/raw/cms-sw/cmssw/pull/44910.diff
git apply 44910.diff
git cherry-pick 15ea9750359193d75e9e406d80d17d7c48b78788
git cms-checkdeps -a
scram b -j 20

hltGetConfiguration /dev/CMSSW_14_0_0/GRun \
		    --unprescale \
		    --output minimal \
		    --globaltag auto:phase1_2024_realistic \
		    --mc \
		    --max-events 100 \
		    --eras Run3 \
		    --l1-emulator uGT --l1 L1Menu_Collisions2024_v1_2_0_xml \
		    --input /store/mc/Run3Winter24Digi/TT_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-RAW/133X_mcRun3_2024_realistic_v8-v2/80000/dc984f7f-2e54-48c4-8950-5daa848b6db9.root \
		    --customise HLTrigger/Configuration/customizeHLTforAlpaka.customizeHLTforAlpaka > hlt_mc.py

cmsRun hlt_mc.py &> hlt.log&

and I don't observe any residual crash.

mandrenguyen · 2024-05-28T09:50:43Z

please test

fwyzard · 2024-05-28T11:31:34Z

Unfortunately, the discrepancy between Alpaka GPU and legacy CPU remains.

Indeed, I still see large discrepancies comparing the HLT results running on GPU with the CUDA vs Alpaka version.

Here are some of the biggest discrepancies:

      Events    Accepted      Gained        Lost       Other  Trigger
       99252        3935        +214         -85       ~7381  HLT_HT300_Beamspot_v20
       99252        1507        +246        -223       ~3417  HLT_CaloMHT90_v12
       99252        1034         +68         -46       ~3772  HLT_CaloMET90_NotCleaned_v12
       99252        2550        +164         -78       ~6796  HLT_PFHT350_v28
       99252        1202         +87         -51       ~8171  HLT_QuadPFJet103_88_75_15_v14
       99252        3902         +50        -156      ~17260  HLT_DiPhoton10_CaloIdL_v8
       99252         394          +1         -18       ~1333  HLT_SingleEle8_v7

fwyzard · 2024-05-28T12:15:15Z

On the other hand, the alpaka GPU results themselves are fairly reproducible, to +/- 1 event out of 100k:

      Events    Accepted      Gained        Lost       Other  Trigger
       99252          41           -          -1           -  HLT_AK8DiPFJet250_250_SoftDropMass40_v3
       99252          32           -          -1           -  HLT_AK8DiPFJet250_250_SoftDropMass50_v3
       99252         115          +1           -           -  HLT_DoubleMu4_MuMuTrk_Displaced_v24
       99252          99          +1           -          ~2  HLT_PFMET120_PFMHT120_IDTight_v29
       99252          65          +1           -           -  HLT_PFMET130_PFMHT130_IDTight_v29
       99252          34          +1           -          ~1  HLT_PFMET120_PFMHT120_IDTight_PFHT60_v18
       99252          39          +1           -           -  HLT_PFMETNoMu120_PFMHTNoMu120_IDTight_PFHT60_v18
       99252         105          +1           -          ~1  HLT_PFMETNoMu120_PFMHTNoMu120_IDTight_v29
       99252          69          +1           -           -  HLT_PFMETNoMu130_PFMHTNoMu130_IDTight_v28
       99252         168          +1          -1           -  HLT_PFMETNoMu110_PFMHTNoMu110_IDTight_FilterHF_v9
       99252          86          +1           -          ~1  HLT_PFMETNoMu120_PFMHTNoMu120_IDTight_FilterHF_v9
       99252          55          +1           -           -  HLT_PFMETNoMu130_PFMHTNoMu130_IDTight_FilterHF_v9
       99252          12           -          -1           -  HLT_Ele14_eta2p5_IsoVVVL_Gsf_PFHT200_PNetBTag0p53_v2
       99252         883          +1           -           -  HLT_PFHT330PT30_QuadPFJet_75_60_45_40_v18
       99252          25          +1           -           -  HLT_PFHT450_SixPFJet36_PNetBTag0p35_v6
       99252        2636           -          -2           -  HLT_PFHT350_v28
       99252          84          +1          -1           -  HLT_PFHT250_QuadPFJet25_PNet1BTag0p20_PNet1Tauh0p50_v3
       99252          74          +1          -1           -  HLT_PFHT250_QuadPFJet30_PNet1BTag0p20_PNet1Tauh0p50_v3
       99252          68          +1          -1           -  HLT_PFHT280_QuadPFJet30_PNet1BTag0p20_PNet1Tauh0p50_v3
       99252         528          +1           -          ~2  HLT_AK8PFJet220_SoftDropMass40_v10
       99252         430          +1           -           -  HLT_AK8PFJet230_SoftDropMass40_v10
       99252         459           -          -1           -  HLT_VBF_DiPFJet125_45_Mjj1200_v3

Which is in line with the results using I get using the CUDA version of the HCAL reconstruction.

cmsbuild · 2024-05-28T16:11:04Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ddea1b/39567/summary.html
COMMIT: 9ac4309
CMSSW: CMSSW_14_1_X_2024-05-27-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44910/39567/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 3 lines from the logs
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 48
DQMHistoTests: Total histograms compared: 3338862
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3338836
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
Checked 202 log files, 165 edm output root files, 48 DQM output files
TriggerResults: no differences found

kakwok and others added 3 commits May 2, 2024 22:22

MAHI-Alpaka first implementation

79beb3c

update atomicMaxPair

04703eb

clean up

7ff970c

cmsbuild added this to the CMSSW_14_1_X milestone May 6, 2024

cmsbuild added reconstruction-pending simulation-pending dqm-pending alca-pending db-pending pending-signatures tests-pending orp-pending code-checks-pending heterogeneous-pending labels May 6, 2024

cmsbuild added code-checks-rejected and removed code-checks-pending labels May 6, 2024

code format

e794820

cmsbuild added code-checks-pending and removed code-checks-rejected labels May 6, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels May 6, 2024

cmsbuild added the hcal label May 6, 2024

fwyzard reviewed May 6, 2024

View reviewed changes

CondFormats/HcalObjects/interface/HcalMahiConditionsHost.h Outdated Show resolved Hide resolved

fwyzard reviewed May 6, 2024

View reviewed changes

CondFormats/HcalObjects/interface/HcalMahiConditionsSoA.h Outdated Show resolved Hide resolved

fwyzard reviewed May 6, 2024

View reviewed changes

CondFormats/HcalObjects/interface/HcalMahiPulseOffsetsHost.h Outdated Show resolved Hide resolved

fwyzard reviewed May 24, 2024

View reviewed changes

kakwok added 2 commits May 27, 2024 05:03

fix syncBlockThreads

7af75cf

add float_as_uint

dceaf88

cmsbuild added code-checks-pending and removed code-checks-rejected labels May 27, 2024

cmsbuild added code-checks-rejected and removed code-checks-pending labels May 27, 2024

code format

9ac4309

cmsbuild added code-checks-pending and removed code-checks-rejected labels May 27, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels May 27, 2024

cmsbuild added tests-started and removed tests-pending labels May 28, 2024

cmsbuild added tests-approved and removed tests-started labels May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

kakwok commented May 6, 2024

cmsbuild commented May 6, 2024 •

edited

cmsbuild commented May 6, 2024

cmsbuild commented May 6, 2024

cmsbuild commented May 6, 2024

fwyzard commented May 6, 2024

fwyzard May 24, 2024

makortel May 24, 2024

kakwok May 27, 2024

fwyzard May 24, 2024

fwyzard commented May 24, 2024

fwyzard commented May 24, 2024 •

edited

makortel commented May 25, 2024

cmsbuild commented May 27, 2024

kakwok commented May 27, 2024

cmsbuild commented May 27, 2024

cmsbuild commented May 27, 2024

fwyzard commented May 27, 2024

mmusich commented May 27, 2024

mandrenguyen commented May 28, 2024

fwyzard commented May 28, 2024

fwyzard commented May 28, 2024

cmsbuild commented May 28, 2024

	class Kernel_prep_pulseMatrices_sameNumberOfSamples {

	class Kernel_prep_pulseMatrices_sameNumberOfSamples {

Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

Are you sure you want to change the base?

Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

Conversation

kakwok commented May 6, 2024

PR description:

PR validation:

cmsbuild commented May 6, 2024 • edited

cmsbuild commented May 6, 2024

cmsbuild commented May 6, 2024

cmsbuild commented May 6, 2024

fwyzard commented May 6, 2024

fwyzard May 24, 2024

Choose a reason for hiding this comment

makortel May 24, 2024

Choose a reason for hiding this comment

kakwok May 27, 2024

Choose a reason for hiding this comment

fwyzard May 24, 2024

Choose a reason for hiding this comment

fwyzard commented May 24, 2024

fwyzard commented May 24, 2024 • edited

makortel commented May 25, 2024

cmsbuild commented May 27, 2024

kakwok commented May 27, 2024

cmsbuild commented May 27, 2024

cmsbuild commented May 27, 2024

fwyzard commented May 27, 2024

mmusich commented May 27, 2024

mandrenguyen commented May 28, 2024

fwyzard commented May 28, 2024

fwyzard commented May 28, 2024

cmsbuild commented May 28, 2024

Comparison Summary

cmsbuild commented May 6, 2024 •

edited

fwyzard commented May 24, 2024 •

edited