Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate results reported for small methods #1802

Open
tannergooding opened this issue Sep 9, 2021 · 1 comment · May be fixed by #2334
Open

Inaccurate results reported for small methods #1802

tannergooding opened this issue Sep 9, 2021 · 1 comment · May be fixed by #2334
Assignees

Comments

@tannergooding
Copy link
Member

Unless doing instruction level profiling, the highest precision timer on a modern computer is about 25-32 cycles (which is, under ideal circumstances, about 6ns on a 5GHz processor at best and 32ns on a 1GHz processor).

Due to platform specific differences, the maximum reported difference in the high precision timer APIs exposed by the OS is about 100ns. Additionally, it is well documented that due to the latency between calls and other factors on the OS or hardware, the latency for such a call can be much worse, such as closer to 300ns when a CPU level timer such as RDTSC is not available: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps#resolution-precision-accuracy-and-stability.

While Benchmark.NET does try to account for small methods and while it also tries to account for noise due to call overhead and the like, there are many cases where the numbers it reports are of questionable accuracy.

One such example is the following:
image

In particular, if we look at the first entry GetShortName_opt is reporting a time of 0.2082 ns. Even in an "ideal" scenario where the JIT is able to fully optimize the comparison against a constant value and optimize it to simple be xor rax, rax, this is still reporting that it takes approximately 1 cycle on a 5GHz CPU.

  • It also shouldn't be able to optimize it like this. AFAIR, Benchmark.NET should be passing the value in and preventing the actual benchmark body from being inlined to avoid such issues.

It would be beneficial, IMO, if Benchmark.NET was more proactive about labeling potentially problematic results and had guidance on how to optimally write a test in a way that will provide accurate results.

  • I would view a problematic result, at the very least, as anything taking less than 10ns. Most of these methods should be testing more than a single instruction and are running on 2-4GHz computers. So in an "ideal" environment, 10ns represents no more than 20 instructions and likely no memory accesses. Very few instructions take 0 cycles. Several take 1 cycle and can be pipelined for up to 4 to be in simultaneous dispatch, but its rare to actually have this. Many take 2-3 cycles and if you have any kind of memory access they will take about 3-11 cycles in the fastest scenario (potentially longer for uncached results among other things).
@timcassell
Copy link
Collaborator

BDN currently under-reports times by 1-2 clock cycles (#1133). Once that is fixed (#2334), I don't think there is any need for more warnings beyond the 0 measurement that we already do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants