Benchmark `.nth[_back](n)` with inputs `n` #916

Philippe-Cholet · 2024-04-16T08:58:14Z

I introduced specialization benchmarks in #786.
However, run Combinations::nth benchmarks in #914 did not give meaningful feedback. Sure it was globally faster but we could not say how much for each n.
So here I change the benchmarks to give informations based on the value n we feed nth with.

codecov · 2024-04-16T09:00:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.43%. Comparing base (6814180) to head (d8f2e47).
Report is 51 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #916      +/-   ##
==========================================
+ Coverage   94.38%   94.43%   +0.05%     
==========================================
  Files          48       48              
  Lines        6665     6869     +204     
==========================================
+ Hits         6291     6487     +196     
- Misses        374      382       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Philippe-Cholet · 2024-04-16T09:03:27Z

@kinto-b
Each Combinations::nth(n) call produced n+1 heap allocated vector items but only 1 now. So allocations decrease by 1/(n+1)-1 (computed in % below).

                       before                           after                            changes
Same allocations for:
combinations1/nth/0    [1.2090 ms 1.2121 ms 1.2157 ms]  [1.1560 ms 1.1599 ms 1.1646 ms]  [-5.1082% -4.5026% -3.7919%]
combinations2/nth/0    [1.1405 ms 1.1447 ms 1.1496 ms]  [1.1473 ms 1.1494 ms 1.1519 ms]  [+0.5059% +0.9210% +1.3303%]
combinations3/nth/0    [1.1819 ms 1.1868 ms 1.1919 ms]  [1.1895 ms 1.1907 ms 1.1919 ms]  [-2.0142% -1.0343% -0.1986%]
combinations4/nth/0    [1.3242 ms 1.3395 ms 1.3561 ms]  [1.2893 ms 1.2914 ms 1.2945 ms]  [-5.7538% -4.5556% -3.4696%]
-50% allocations for:
combinations1/nth/1    [1.2196 ms 1.2239 ms 1.2284 ms]  [639.53 µs 643.26 µs 647.97 µs]  [-49.143% -48.354% -47.710%]
combinations2/nth/1    [1.1377 ms 1.1428 ms 1.1480 ms]  [600.34 µs 601.93 µs 603.75 µs]  [-47.541% -47.277% -47.008%]
combinations3/nth/1    [1.1770 ms 1.1817 ms 1.1876 ms]  [641.31 µs 642.99 µs 644.99 µs]  [-47.351% -46.549% -45.859%]
combinations4/nth/1    [1.2842 ms 1.2854 ms 1.2869 ms]  [720.70 µs 723.56 µs 726.84 µs]  [-43.637% -43.370% -43.065%]
-67% allocations for:
combinations1/nth/2    [1.2684 ms 1.2740 ms 1.2792 ms]  [463.04 µs 463.50 µs 463.98 µs]  [-63.556% -63.375% -63.200%]
combinations2/nth/2    [1.2179 ms 1.2229 ms 1.2279 ms]  [422.52 µs 423.28 µs 424.16 µs]  [-67.348% -66.251% -65.402%]
combinations3/nth/2    [1.2612 ms 1.2660 ms 1.2712 ms]  [450.01 µs 450.41 µs 450.84 µs]  [-64.412% -64.252% -64.098%]
combinations4/nth/2    [1.3132 ms 1.3188 ms 1.3237 ms]  [499.48 µs 500.65 µs 502.05 µs]  [-62.071% -61.902% -61.722%]
-80% allocations for:
combinations1/nth/4    [1.2246 ms 1.2296 ms 1.2363 ms]  [317.72 µs 318.69 µs 319.68 µs]  [-75.067% -74.630% -74.296%]
combinations2/nth/4    [1.1881 ms 1.1911 ms 1.1947 ms]  [276.03 µs 276.49 µs 277.07 µs]  [-76.927% -76.629% -76.256%]
combinations3/nth/4    [1.2288 ms 1.2343 ms 1.2416 ms]  [306.15 µs 306.67 µs 307.22 µs]  [-75.649% -75.488% -75.343%]
combinations4/nth/4    [1.2963 ms 1.3016 ms 1.3073 ms]  [335.58 µs 337.01 µs 339.08 µs]  [-74.351% -73.952% -73.533%]
-89% allocations for:
combinations1/nth/8    [1.2747 ms 1.2808 ms 1.2862 ms]  [209.95 µs 210.26 µs 210.60 µs]  [-83.432% -83.218% -82.982%]
combinations2/nth/8    [1.1510 ms 1.1534 ms 1.1559 ms]  [191.80 µs 192.31 µs 192.92 µs]  [-83.572% -83.443% -83.343%]
combinations3/nth/8    [1.2398 ms 1.2568 ms 1.2758 ms]  [204.40 µs 204.66 µs 204.96 µs]  [-83.773% -83.538% -83.274%]
combinations4/nth/8    [1.2997 ms 1.3029 ms 1.3062 ms]  [224.81 µs 225.83 µs 227.24 µs]  [-82.748% -82.660% -82.569%]

Benchmark changes closely follow allocations changes.
Nothing surprising here (heap allocations are the bottleneck) but it's definitely a better description than the global -65% the previous benchmark gave us.

phimuemue

Thanks.

Could we argue similarly for start? (I.e. should we export for each value of start separately?)

Philippe-Cholet · 2024-04-16T10:34:43Z

@phimuemue I'm not sure about start.

Build (not run) those benchmarks was already slow (~2 minutes) and is probably a bit slower now.
So slow 😁 that I usually comment out iterators I don't want to benchmark before I run the command. Then build is fast.
If we move start from the benchmark to the inputs (where n is now), we end up with even more benchmarks (slower to build and to run).
I'm not so sure consider start is meaningful at all, inside or outside. It felt like a good idea at the time but is it? We have some iterators like combinations that lazily load things at the start but is it worth it enough to consider those starts?

phimuemue · 2024-04-16T11:33:07Z

Build (not run) those benchmarks was already slow (~2 minutes) and is probably a bit slower now.

Sometime ago I tried changing bench_specializations so that it accepts one benchmark only. On my machine, this cut down compile times by 40%.

When I want to benchmark only a few iterators, I don't use iproduct and end up with an annoying and recurring warning that this simple fix solves.

Philippe-Cholet · 2024-04-16T15:16:22Z

I found some macro magic trick to select some of the benchmarks (by prefix names with a lifetime (dirty trick)) and reduce build times.
But it complexifies the macro so much that I rather occasionally and temporarily comment out parts of the declaration than update such trick when needed in the future.

@phimuemue What do you think about those starts? Should I delete them or do we merge this as is?

phimuemue · 2024-04-16T17:43:45Z

I would leave it in. I think the reason is that iterators may behave differently after they've been nexted several times (if they are similar to Chain e.g.).

We can remove it later once we are sure that it's not needed.

kinto-b · 2024-04-16T23:48:58Z

@Philippe-Cholet Ah thanks for sharing that. I am a bit confused by the fact that the 'after' times get smaller and smaller for larger and larger n. I'd have expected .nth(8) to be slower than .nth(1) no matter what. Or have I misunderstood what the times represent?

Philippe-Cholet · 2024-04-17T07:06:59Z

@kinto-b The benchmark roughly measures while let Some(_x) = it.nth(n) {} (and some start shenanigans you can ignore). With larger n, less vectors are created by nth(n) over the entire iteration, roughly iterator_length / (n + 1).
Said differently, .nth(0)s creates every vector, .nth(1)s creates half, .nth(8)s creates 1/9 of them.
Less vectors created leads to smaller times

Benchmark .nth[_back](n) with inputs n

fb8a17c

Philippe-Cholet added this to the next milestone Apr 16, 2024

Philippe-Cholet requested a review from phimuemue April 16, 2024 09:14

Benchmark specializations: minor doc/usage changes

3944ed4

phimuemue approved these changes Apr 16, 2024

View reviewed changes

Update specializations.rs

d8f2e47

When I want to benchmark only a few iterators, I don't use iproduct and end up with an annoying and recurring warning that this simple fix solves.

Philippe-Cholet added this pull request to the merge queue Apr 16, 2024

Merged via the queue into rust-itertools:master with commit ed695af Apr 16, 2024
13 checks passed

Philippe-Cholet deleted the benchmark-nth-with-inputs branch April 16, 2024 17:49

Philippe-Cholet mentioned this pull request Apr 25, 2024

Specialize CombinationsWithReplacement::nth #923

Merged

Philippe-Cholet mentioned this pull request May 14, 2024

Prepare v0.13.0 release #937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark `.nth[_back](n)` with inputs `n` #916

Benchmark `.nth[_back](n)` with inputs `n` #916

Philippe-Cholet commented Apr 16, 2024

codecov bot commented Apr 16, 2024 •

edited

Philippe-Cholet commented Apr 16, 2024

phimuemue left a comment •

edited

Philippe-Cholet commented Apr 16, 2024 •

edited

phimuemue commented Apr 16, 2024 •

edited

Philippe-Cholet commented Apr 16, 2024

phimuemue commented Apr 16, 2024

kinto-b commented Apr 16, 2024

Philippe-Cholet commented Apr 17, 2024

Benchmark .nth[_back](n) with inputs n #916

Benchmark .nth[_back](n) with inputs n #916

Conversation

Philippe-Cholet commented Apr 16, 2024

codecov bot commented Apr 16, 2024 • edited

Codecov Report

Philippe-Cholet commented Apr 16, 2024

phimuemue left a comment • edited

Choose a reason for hiding this comment

Philippe-Cholet commented Apr 16, 2024 • edited

phimuemue commented Apr 16, 2024 • edited

Philippe-Cholet commented Apr 16, 2024

phimuemue commented Apr 16, 2024

kinto-b commented Apr 16, 2024

Philippe-Cholet commented Apr 17, 2024

Benchmark `.nth[_back](n)` with inputs `n` #916

Benchmark `.nth[_back](n)` with inputs `n` #916

codecov bot commented Apr 16, 2024 •

edited

phimuemue left a comment •

edited

Philippe-Cholet commented Apr 16, 2024 •

edited

phimuemue commented Apr 16, 2024 •

edited