Inaccurate results reported for small methods #1802

tannergooding · 2021-09-09T17:47:09Z

Unless doing instruction level profiling, the highest precision timer on a modern computer is about 25-32 cycles (which is, under ideal circumstances, about 6ns on a 5GHz processor at best and 32ns on a 1GHz processor).

Due to platform specific differences, the maximum reported difference in the high precision timer APIs exposed by the OS is about 100ns. Additionally, it is well documented that due to the latency between calls and other factors on the OS or hardware, the latency for such a call can be much worse, such as closer to 300ns when a CPU level timer such as RDTSC is not available: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps#resolution-precision-accuracy-and-stability.

While Benchmark.NET does try to account for small methods and while it also tries to account for noise due to call overhead and the like, there are many cases where the numbers it reports are of questionable accuracy.

One such example is the following:

In particular, if we look at the first entry GetShortName_opt is reporting a time of 0.2082 ns. Even in an "ideal" scenario where the JIT is able to fully optimize the comparison against a constant value and optimize it to simple be xor rax, rax, this is still reporting that it takes approximately 1 cycle on a 5GHz CPU.

It also shouldn't be able to optimize it like this. AFAIR, Benchmark.NET should be passing the value in and preventing the actual benchmark body from being inlined to avoid such issues.

It would be beneficial, IMO, if Benchmark.NET was more proactive about labeling potentially problematic results and had guidance on how to optimally write a test in a way that will provide accurate results.

I would view a problematic result, at the very least, as anything taking less than 10ns. Most of these methods should be testing more than a single instruction and are running on 2-4GHz computers. So in an "ideal" environment, 10ns represents no more than 20 instructions and likely no memory accesses. Very few instructions take 0 cycles. Several take 1 cycle and can be pipelined for up to 4 to be in simultaneous dispatch, but its rare to actually have this. Many take 2-3 cycles and if you have any kind of memory access they will take about 3-11 cycles in the fastest scenario (potentially longer for uncached results among other things).

The text was updated successfully, but these errors were encountered:

timcassell · 2023-08-16T03:28:40Z

BDN currently under-reports times by 1-2 clock cycles (#1133). Once that is fixed (#2334), I don't think there is any need for more warnings beyond the 0 measurement that we already do.

AndreyAkinshin self-assigned this Sep 10, 2021

tannergooding mentioned this issue Jul 29, 2022

Adding benchmarks for the new single/double math APIs dotnet/performance#2540

Merged

tannergooding mentioned this issue Jul 25, 2023

Regressions in System.MathBenchmarks.Double dotnet/runtime#85985

Closed

timcassell linked a pull request Mar 6, 2024 that will close this issue

Call benchmark method directly #2334

Open

timcassell mentioned this issue May 24, 2024

How can I obtain the addition result of two int variables without using a loop for testing? #2576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccurate results reported for small methods #1802

Inaccurate results reported for small methods #1802

tannergooding commented Sep 9, 2021

timcassell commented Aug 16, 2023

Inaccurate results reported for small methods #1802

Inaccurate results reported for small methods #1802

Comments

tannergooding commented Sep 9, 2021

timcassell commented Aug 16, 2023