Data for different CPU architectures #2

losfair · 2021-02-12T03:59:29Z

It seems that the results vary a lot on different CPU architectures.

Testing on a Ubuntu VM (kernel version 5.4.0-65-generic) running on Apple M1 with the thread-brigade and async-brigade tests:

$ /bin/time ../target/release/async-brigade 
500 tasks, 10000 iterations:
mean 572.666µs per iteration, stddev 10.912µs (1145.000ns per task per iter)
2.56user 3.26system 0:05.83elapsed 99%CPU (0avgtext+0avgdata 3964maxresident)k
0inputs+0outputs (0major+399minor)pagefaults 0swaps

$ /bin/time ../target/release/thread-brigade 
500 tasks, 10000 iterations:
mean 7.104ms per iteration, stddev 226.822µs (14.208µs per task per iter)
7.09user 78.75system 1:11.91elapsed 119%CPU (0avgtext+0avgdata 8340maxresident)k
0inputs+0outputs (0major+1523minor)pagefaults 0swaps

So it's a 90% speedup, not a 30% one.

Pinning to a single CPU core brings the threaded version closer to async though:

$ taskset --cpu-list 1 /bin/time ../target/release/thread-brigade 
500 tasks, 10000 iterations:
mean 660.847µs per iteration, stddev 13.810µs (1321.000ns per task per iter)
0.49user 6.28system 0:06.83elapsed 99%CPU (0avgtext+0avgdata 6100maxresident)k
0inputs+0outputs (0major+1544minor)pagefaults 0swaps

The text was updated successfully, but these errors were encountered:

jimblandy · 2021-02-12T06:53:59Z

Do the benchmarks not run directly on macOS?

I amended the README to say that I don't really understand why pinning to a single core speeds up thread-brigade. I mean, sure, I can guess that cross-core traffic is too slow or whatever, but that's not the same as actually knowing what is specifically happening.

losfair · 2021-02-12T06:59:50Z

I ran the tests in a Linux VM to keep the environment consistent with described in README.

Running natively on macOS:

% time ../target/release/async-brigade
500 tasks, 10000 iterations:
mean 677.307µs per iteration, stddev 10.876µs (1354.000ns per task per iter)
../target/release/async-brigade  3.20s user 3.69s system 99% cpu 6.894 total

% time ../target/release/thread-brigade
500 tasks, 10000 iterations:
mean 688.988µs per iteration, stddev 79.514µs (1377.000ns per task per iter)
../target/release/thread-brigade  0.89s user 6.17s system 100% cpu 7.015 total

Looks like there are some scheduler policy differences between Linux and macOS leading to the difference.

jimblandy · 2021-02-12T16:09:06Z

Wow, they're about the same.

And, if I may continue to impose, how about one-thread-brigade, to see how much time is due to the I/O alone?

losfair · 2021-02-13T07:21:27Z

@jimblandy

how about one-thread-brigade, to see how much time is due to the I/O alone?

macOS (M1):

% time ../target/release/one-thread-brigade
10000 iterations, 500 tasks, mean 259.929µs per iteration, stddev 29.523µs (519.000ns per task per iter)
../target/release/one-thread-brigade  0.69s user 1.92s system 99% cpu 2.617 total

Ubuntu (VM on M1)

$ /bin/time ../target/release/one-thread-brigade 
10000 iterations, 500 tasks, mean 217.284µs per iteration, stddev 30.133µs (434.000ns per task per iter)
0.48user 1.70system 0:02.19elapsed 99%CPU (0avgtext+0avgdata 1664maxresident)k
0inputs+0outputs (0major+91minor)pagefaults 0swaps

ehiggs · 2021-02-15T22:31:32Z

Running natively on macOS:

% time ../target/release/async-brigade
mean 677.307µs per iteration

Testing on a Ubuntu VM (kernel version 5.4.0-65-generic) running on Apple M1 with the thread-brigade and async-brigade tests:

$ /bin/time ../target/release/async-brigade 
mean 572.666µs per iteration

...

macOS (M1):

% time ../target/release/one-thread-brigade
...mean 259.929µs per iteration

Ubuntu (VM on M1)

$ /bin/time ../target/release/one-thread-brigade 
mean 217.284µs per

Ubuntu in a VM is outperforming native macOS? Does seems like a weird result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data for different CPU architectures #2

Data for different CPU architectures #2

losfair commented Feb 12, 2021

jimblandy commented Feb 12, 2021

losfair commented Feb 12, 2021

jimblandy commented Feb 12, 2021

losfair commented Feb 13, 2021

ehiggs commented Feb 15, 2021

Data for different CPU architectures #2

Data for different CPU architectures #2

Comments

losfair commented Feb 12, 2021

jimblandy commented Feb 12, 2021

losfair commented Feb 12, 2021

jimblandy commented Feb 12, 2021

losfair commented Feb 13, 2021

ehiggs commented Feb 15, 2021