Noise in Sandmark #198

kayceesrk · 2020-12-03T07:37:52Z

Following the discussion in ocaml/ocaml#9934, I set out to quantify the noise in Sandmark macrobenchmark runs. Before asking complex questions about loop alignments and microarchitectural optimisations as was done in ocaml/ocaml#10039, I wanted to measure the noise between multiple runs of the same code. It is important to note that currently, we only run a single iteration of each variant.

The benchmarking was done on IITM "turing" which is a Intel Xeon Gold 5120 CPU machine with isolated cores, cpu governer set to performance, hyper threading disabled, turbo boost disabled, interrupts and rcu_callbacks directed to non-isolated cores but ASLR on [1]. The result on two runs of the latest commit from https://github.com/stedolan/ocaml/tree/sweep-optimisation is here:

The outlier is worrisome, but there is up to 2% difference in both directions. Moving forward, we should consider the following:

Arrive at a measure for statistical significance on a given machine. What would be the minimum difference beyond which the result is statistically significant. This will vary based on the benchmark and the topic (running time, maxRSS).
Run multiple iterations. Sandmark already has an ITER variable which runs the experiments for multiple runs. The notebooks need to be updated so that mean (and standard deviation) are computed first and the graphs are updated to include error bars. The downside is that the benchmarking will take significantly longer. We should choose a representative set of macro benchmarks for quick study and reserve the full macro benchmark run for the final result. Can we run the sequential macro benchmarks in parallel on different isolated cores? What would be the impact of this on individual benchmark runs?

[1] https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking

The text was updated successfully, but these errors were encountered:

kayceesrk · 2020-12-04T12:23:34Z

Now with ASLR turned off

Still the noise is around 2%.

gasche · 2021-01-05T12:14:56Z

It looks like most of the benchmarks are not actually very noisy (the noise observed is well below 1%), and a smaller group of benchmarks are noisier. This suggests that you could track per benchmark how many iterations to run and what's the expected noise level, and try to have more stable benchmarks and fewer unstable benchmarks to keep the total running time in check. In particular, most of the long-running benchmarks are not noisy in this test, so maybe they could be run a single time. knucleotide is the only >=10s noisy benchmark.

Regarding error bars: instead of error bars specific to a run (this requires several iterations to show error bars), you could store for each benchmark its typical "noise range" (the largest observed difference in past noise-detecting runs), and display those as error bars for all future runs. This gives good visual feedback when looking at benchmark graphs, without requiring several iterations.

shakthimaan · 2022-08-04T11:17:39Z

Noise has been reported for the soli benchmark for 5.1.0+trunk.
Reference: ocaml/ocaml#11102 (comment)

kayceesrk · 2022-08-29T14:55:44Z

Noise has been reported for the soli benchmark for 5.1.0+trunk.
Reference: ocaml/ocaml#11102 (comment)

As mentioned in the linked comment, soli running time is too small. Either we make it run longer or remove the macro benchmark tag. It was a mistake to have tagged it as a macro benchmark in the first place.

See #348. I believe I've fixed a number of these to run longer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise in Sandmark #198

Noise in Sandmark #198

kayceesrk commented Dec 3, 2020

kayceesrk commented Dec 4, 2020 •

edited

gasche commented Jan 5, 2021

shakthimaan commented Aug 4, 2022

kayceesrk commented Aug 29, 2022 •

edited

Noise in Sandmark #198

Noise in Sandmark #198

Comments

kayceesrk commented Dec 3, 2020

kayceesrk commented Dec 4, 2020 • edited

gasche commented Jan 5, 2021

shakthimaan commented Aug 4, 2022

kayceesrk commented Aug 29, 2022 • edited

kayceesrk commented Dec 4, 2020 •

edited

kayceesrk commented Aug 29, 2022 •

edited