Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results with each run #883

Open
nucleosynthesis opened this issue Dec 16, 2023 · 10 comments
Open

Different results with each run #883

nucleosynthesis opened this issue Dec 16, 2023 · 10 comments
Labels

Comments

@nucleosynthesis
Copy link
Contributor

As Pointed out by @adavidzh , when running HybridNew with the same (default) seed, the limit is different. The RooFit seed is set but not clear that ROOT gRandom is.

Possibly related to this line

TRandom3 *rnd = new TRandom3();

Does anyone know (maybe @guitargeek ?) If this coukd be the cause and how to resolve it?

@guitargeek
Copy link
Contributor

Hi, from the top of my head I have no idea. How can this be reproduced? Then I can help figuring out what's going on

@nucleosynthesis
Copy link
Contributor Author

Running the following will show the behaviour

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits
--rMax 2.0 --clsAcc 0.01

The datacard can be found in

data/tutorials/CAT23001/template-analysis-datacard.txt

@nucleosynthesis
Copy link
Contributor Author

nucleosynthesis commented Dec 18, 2023

Some updates. It seems to be related to the "on the fly" workspace creation. If one first creates the binary workspace and uses that as input, then the repeated runs produces the same output eg.

With the text file directly:

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.335698 +/- 0.0143468 @ 95% CL
Done in 0.15 min (cpu), 0.15 min (real)

combine template-analysis-datacard.txt -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.356193 +/- 0.0349485 @ 95% CL
Done in 0.18 min (cpu), 0.18 min (real)

first building the workspace:

text2workspace.py template-analysis-datacard.txt

combine template-analysis-datacard.root -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01
 -- Hybrid New --
Limit: r < 0.346362 +/- 0.0134581 @ 95% CL
Done in 0.24 min (cpu), 0.25 min (real)

combine template-analysis-datacard.root -M HybridNew --LHCmode LHC-limits --rMax 2.0 --clsAcc 0.01 
 -- Hybrid New --
Limit: r < 0.346362 +/- 0.0134581 @ 95% CL
Done in 0.24 min (cpu), 0.24 min (real)

Could the binary file creation have something to do with it?

@nucleosynthesis
Copy link
Contributor Author

Digging a bit more, it seems like this might be intentional. In these lines

TString tmpDir = "", tmpFile = "", pwd(gSystem->pwd());
if (makeTempDir_) {
tmpDir = "roostats-XXXXXX"; tmpFile = "model";
mkdtemp(const_cast<char *>(tmpDir.Data()));
gSystem->cd(tmpDir.Data());
garbageCollect.path = tmpDir.Data(); // request that we delete this dir when done
} else if (!hlfFile.EndsWith(".hlf") && !hlfFile.EndsWith(".root")) {
char buff[99]; snprintf(buff, 98, "roostats-XXXXXX");
int fd = mkstemp(buff); close(fd);
tmpFile = buff;
unlink(tmpFile); // this is to be deleted, since we'll use tmpFile+".root"
}
, a temporary file is created to store the workspace if the input is the datacard. This allows one to run multiple commands in parallel without worrying about each one writing over eachother.

However, this c++ method cannot be seeded for reproducibility and since it seems RooFits random number generator is tied to TRandom (which in turn is tied to this one - is that right @guitargeek ?), we can't avoid non-reproducible results if one runs the same command over and over on the .txt file.

@adavidzh , I think we just need to make it clear to a user that if they run the commands that use toys, they will get different but consistent results each time unless they first convert to a binary file and use that as the input.

@adavidzh
Copy link
Member

a temporary file is created to store the workspace if the input is the datacard. This allows one to run multiple commands in parallel without worrying about each one writing over eachother.

I would argue that we should be able to create a unique - per job - deterministic identifier. I.e., we do not need a random thing, just a unique one, that can e.g., be made out of a hash of things like the datacard and command line options.

@nucleosynthesis
Copy link
Contributor Author

nucleosynthesis commented Dec 24, 2023 via email

@nucleosynthesis
Copy link
Contributor Author

Hint @guitargeek 😉

@adavidzh
Copy link
Member

I don't get it @nucleosynthesis: the code you reference (with mkdtemp and mkstemp) is combine's.

@nucleosynthesis
Copy link
Contributor Author

Right,

https://man7.org/linux/man-pages/man3/mkstemp.3.html

https://man7.org/linux/man-pages/man3/mkdtemp.3.html

Are what we use to guarantee to uniqueness but it seems that uses the same seed that RooFit will then use.

Using the same datacard and command would imply the same hash so not sure that would work with concurrent identical command lines (which we can't do currently)

Too much of this is my own speculation (based on some minimal testing) though, so perhaps there is simply a way to avoid the clash between the seed that RooFit uses and the one being triggered by the call to mkstemp.

@kcormi
Copy link
Collaborator

kcormi commented Apr 10, 2024

Just for completeness, I don't think this is limited to toy-based methods like HybridNew. I see the same effect (slightly different numerical values) when running the asymptotic Significance method from the datacards, but if I produce the workspace first and rerun the results are identical. Still, negligibly small differences, but it appears to be the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants