Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RelVals 180.1, 181.1: crash on non-default architectures #44863

Closed
iarspider opened this issue Apr 29, 2024 · 13 comments
Closed

RelVals 180.1, 181.1: crash on non-default architectures #44863

iarspider opened this issue Apr 29, 2024 · 13 comments

Comments

@iarspider
Copy link
Contributor

iarspider commented Apr 29, 2024

RelVals 180.1, 181.1, modified in #44671, are failing on for non-default architectures (e.g. slc7_amd64_gcc12).

This is expected, since these relvals use gridpacks, which are only available for default architecture.
Should we add these two RelVals to the list of "known failures"?

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 29, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @iarspider.

@rappoccio, @smuzaffar, @Dr15Jones, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@iarspider
Copy link
Contributor Author

assign core,generators,pdmv,upgrade

@cmsbuild
Copy link
Contributor

New categories assigned: core,generators,pdmv,upgrade

@Dr15Jones,@AdrianoDee,@alberto-sanchez,@bbilin,@GurpreetSinghChahal,@sunilUIET,@miquork,@makortel,@mkirsano,@menglu21,@SiewYan,@smuzaffar,@srimanob,@subirsarkar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

Just to add here the error message from the log

./starlight: error while loading shared libraries: libmvec.so.1: cannot open shared object file: No such file or directory
starlight error: exit code not 0
----- Begin Fatal Exception 28-Apr-2024 11:34:01 CEST-----------------------
An exception of category 'ExternalLHEProducer' occurred while
   [0] Processing global begin Run run: 1
   [1] Calling method for module ExternalLHEProducer/'externalLHEProducer'
Exception Message:
Child failed with exit code 1.
----- End Fatal Exception -------------------------------------------------

@Dr15Jones
Copy link
Contributor

On el9 release areas it fails because it can't find a shared library:

/data/cmsbld/jenkins/workspace/ib-run-relvals/CMSSW_14_1_X_2024-04-26-2300/pyRelval/180.1_Starlight_DoubleDiffraction_5360_HI_2023/thread2/lheevent/macros/convert_SL2LHE: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
convert_SL2LHE error: exit code not 0
----- Begin Fatal Exception 27-Apr-2024 03:17:01 CEST-----------------------
An exception of category 'ExternalLHEProducer' occurred while
   [0] Processing global begin Run run: 1
   [1] Calling method for module ExternalLHEProducer/'externalLHEProducer'
Exception Message:
Child failed with exit code 1.
----- End Fatal Exception -------------------------------------------------

It appears to me like we need to consider changing the running of gridpacks to be done within containers in order to avoid OS incompatibilities.

@stahlleiton
Copy link
Contributor

This issue should now be solved with #44900 .
Were there any further issues in the IBs after merging this PR and also fixing the scram for slc6?
If not, then I think we can close this issue

@makortel
Copy link
Contributor

I'm not seeing these problems anymore. Thanks!

@makortel
Copy link
Contributor

+core

@srimanob
Copy link
Contributor

+Upgrade

@AdrianoDee
Copy link
Contributor

+pdmv

@bbilin
Copy link
Contributor

bbilin commented May 17, 2024

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants