API for Batch Input Creation #86

N9199 · 2023-02-08T16:43:00Z

I'm looking for a way to create input for each batch. As an example, if I wanted to benchmark something like sorting algorithms I need to have a new unsorted vector for each batch, either via copying one already created or by shuffling the one already used. In both cases the setup isn't negligible, so I'm wondering if nanobench has an API which helps with that? I was thinking of something like Google Benchmarks PauseTiming and ResumeTiming, or something like Rust's Criterion iter_batched and iter_batched_ref?

The text was updated successfully, but these errors were encountered:

martinus · 2023-02-08T19:58:46Z

No API for that because measurement quality could be very low if the timer has to be restarted often. Instead, please do 2 measurements: first measure runtime of just the setup (e.g. creating & shuffling), then benchmark everything, and subtract the first.

helmesjo · 2023-02-10T00:42:34Z

@martinus

Couldn't this be easily solved by having all setup done first (for x amount of iterations, and from the docs it seems like nanobench calculates up-front how many iterations to do). Then it would just be a matter of picking the next "batch" from a contiguous list:

std::vector<int> generate_shuffled_vector();

ankerl::nanobench::BenchBatch<std::vector<int>>().
  generateBatch([] -> std::vector<int>{
    // returned batch can just be stored
    // in whatever structure you see fit (eg. in this
    // case perhaps `std::vector<std::vector<int>>`)
    return generate_shuffled_vector();
  }).run("amazing sorter", [](std::vector<int>& batch){
    // `batch` is just the next entry, and next, and next...
    my_amazing_sorter(batch);
  }
);

As long as whatever generated batch is either copy- or moveable, all should be fine...

Obviously there is some overhead here, but it should be fairly stable (depending on what your "batch" is).
Also, I guess most scenarios requiring this are probably non-trivial, so I bet the batches[next++] won't cause totally useless results... But I'm all guessing here. Just seems that being able to do something as simple as a sorting algorithm would be expected.

Also I want to get away from google-bench, but need this. 🙃

N9199 · 2023-02-10T13:45:21Z

To add to what @helmesjo said, I looked into how Criterion implemented it, and I saw the current implementation of run, it seems that they do essentially the same, but they add a preparation step where they construct the inputs. Now maybe I'm missing something, but couldn't the same be done in nanobench with the input construction happening before pc.beginMeasure();?

martinus · 2023-02-16T19:45:24Z

Ive reopened the issues and linked all similar issues I previously closed to this one. I guess public interest is too large for me to brush it away...

What I absolutely not want is start/stop of the timer in the tight measurement loop (which seems to be what Criterion is doing). That would make it impossible to benchmark something that's fast, especially because I also need to start/stop the linux performance counters. I need contemplate more on how to best do it, but currently don't have too much free time on my hand for this.

jonas-schulze · 2023-02-20T14:02:31Z

I would love to have this feature as well. Maybe it would be the easiest to have a different entry point for benchmarks that require setup (and teardown!) to be excluded from the measurements, say FancyBench()::run(), as -- I assume -- they are not what you consider "fast". 😉

haydenflinner · 2023-05-15T21:59:48Z

I've landed here because I think I need this feature to benchmark 'something fast', Protobuf encode+decode. Basically, I have an encode+decode step which benchmarks out as 100ns. Then if I break this apart into two separate functions, one bench for encode and one for decode, I get 30ns for encode and 1ns for decode. These results don't really make sense naively, I'd expect X+Y=Z, 30+something=100. I suspect either something more is being optimized out (hard to tell because I'm no good at disassembly 😃 ) or this might be an artifact of running the things in two separate loops so that the branching/caching get along better. I just don't expect such a huge true difference because my message I'm encoding/decoding is less than a cache-line in size.

But so far the only examples I've seen of state larger than a word carried across runs and ensuring things won't be optimized out is the gtest approach of redoing the setup within the hot loop, starting and stopping a timer for the part you're interested in. I understand this might add some overhead, but I'm not that concerned with that size of overhead, especially because when my code is embedded in another process, I will be far less certain exactly what the cost of each line of code is, due to different cache conditions/inlinability vs the microbench.

RT2Code · 2023-10-19T20:23:40Z

I understand why you don't want to pause the timer between epoch iterations and I agree with you about that, but at least we really need fixtures between epochs.

It's probably not ideal, but here's an example of a possible implementation :
RT2Code@c79a081

And my use case :

Signal<void(int)> signal;
bench.epochIterations(1000).run("Connect", [&]()
{
	signal.connect(slot);
},
[] {}, // Setup function (doing nothing here)
[&] // Teardown function
{
	signal.reset();
});

Connect is a function with side effects since it appends the slot to a vector, so using a fixed iteration number is better here in order to have a consistent result between each run. The teardown function serves the purpose of resetting the signal between each epoch. Without this, subsequent epochs would cause the vector to expand, leading to severe variations in timing measurements.

martinus closed this as completed Feb 8, 2023

martinus reopened this Feb 16, 2023

martinus added research-needed enhancement New feature or request labels Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for Batch Input Creation #86

API for Batch Input Creation #86

N9199 commented Feb 8, 2023

martinus commented Feb 8, 2023

helmesjo commented Feb 10, 2023 •

edited

N9199 commented Feb 10, 2023 •

edited

martinus commented Feb 16, 2023

jonas-schulze commented Feb 20, 2023

haydenflinner commented May 15, 2023

RT2Code commented Oct 19, 2023

API for Batch Input Creation #86

API for Batch Input Creation #86

Comments

N9199 commented Feb 8, 2023

martinus commented Feb 8, 2023

helmesjo commented Feb 10, 2023 • edited

N9199 commented Feb 10, 2023 • edited

martinus commented Feb 16, 2023

jonas-schulze commented Feb 20, 2023

haydenflinner commented May 15, 2023

RT2Code commented Oct 19, 2023

helmesjo commented Feb 10, 2023 •

edited

N9199 commented Feb 10, 2023 •

edited