Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data for different CPU architectures #2

Open
losfair opened this issue Feb 12, 2021 · 5 comments
Open

Data for different CPU architectures #2

losfair opened this issue Feb 12, 2021 · 5 comments

Comments

@losfair
Copy link

losfair commented Feb 12, 2021

It seems that the results vary a lot on different CPU architectures.

Testing on a Ubuntu VM (kernel version 5.4.0-65-generic) running on Apple M1 with the thread-brigade and async-brigade tests:

$ /bin/time ../target/release/async-brigade 
500 tasks, 10000 iterations:
mean 572.666µs per iteration, stddev 10.912µs (1145.000ns per task per iter)
2.56user 3.26system 0:05.83elapsed 99%CPU (0avgtext+0avgdata 3964maxresident)k
0inputs+0outputs (0major+399minor)pagefaults 0swaps
$ /bin/time ../target/release/thread-brigade 
500 tasks, 10000 iterations:
mean 7.104ms per iteration, stddev 226.822µs (14.208µs per task per iter)
7.09user 78.75system 1:11.91elapsed 119%CPU (0avgtext+0avgdata 8340maxresident)k
0inputs+0outputs (0major+1523minor)pagefaults 0swaps

So it's a 90% speedup, not a 30% one.

Pinning to a single CPU core brings the threaded version closer to async though:

$ taskset --cpu-list 1 /bin/time ../target/release/thread-brigade 
500 tasks, 10000 iterations:
mean 660.847µs per iteration, stddev 13.810µs (1321.000ns per task per iter)
0.49user 6.28system 0:06.83elapsed 99%CPU (0avgtext+0avgdata 6100maxresident)k
0inputs+0outputs (0major+1544minor)pagefaults 0swaps
@jimblandy
Copy link
Owner

Do the benchmarks not run directly on macOS?

I amended the README to say that I don't really understand why pinning to a single core speeds up thread-brigade. I mean, sure, I can guess that cross-core traffic is too slow or whatever, but that's not the same as actually knowing what is specifically happening.

@losfair
Copy link
Author

losfair commented Feb 12, 2021

I ran the tests in a Linux VM to keep the environment consistent with described in README.

Running natively on macOS:

% time ../target/release/async-brigade
500 tasks, 10000 iterations:
mean 677.307µs per iteration, stddev 10.876µs (1354.000ns per task per iter)
../target/release/async-brigade  3.20s user 3.69s system 99% cpu 6.894 total
% time ../target/release/thread-brigade
500 tasks, 10000 iterations:
mean 688.988µs per iteration, stddev 79.514µs (1377.000ns per task per iter)
../target/release/thread-brigade  0.89s user 6.17s system 100% cpu 7.015 total

Looks like there are some scheduler policy differences between Linux and macOS leading to the difference.

@jimblandy
Copy link
Owner

Wow, they're about the same.

And, if I may continue to impose, how about one-thread-brigade, to see how much time is due to the I/O alone?

@losfair
Copy link
Author

losfair commented Feb 13, 2021

@jimblandy

how about one-thread-brigade, to see how much time is due to the I/O alone?

macOS (M1):

% time ../target/release/one-thread-brigade
10000 iterations, 500 tasks, mean 259.929µs per iteration, stddev 29.523µs (519.000ns per task per iter)
../target/release/one-thread-brigade  0.69s user 1.92s system 99% cpu 2.617 total

Ubuntu (VM on M1)

$ /bin/time ../target/release/one-thread-brigade 
10000 iterations, 500 tasks, mean 217.284µs per iteration, stddev 30.133µs (434.000ns per task per iter)
0.48user 1.70system 0:02.19elapsed 99%CPU (0avgtext+0avgdata 1664maxresident)k
0inputs+0outputs (0major+91minor)pagefaults 0swaps

@ehiggs
Copy link

ehiggs commented Feb 15, 2021

Running natively on macOS:

% time ../target/release/async-brigade
mean 677.307µs per iteration

Testing on a Ubuntu VM (kernel version 5.4.0-65-generic) running on Apple M1 with the thread-brigade and async-brigade tests:

$ /bin/time ../target/release/async-brigade 
mean 572.666µs per iteration

...

macOS (M1):

% time ../target/release/one-thread-brigade
...mean 259.929µs per iteration

Ubuntu (VM on M1)

$ /bin/time ../target/release/one-thread-brigade 
mean 217.284µs per 

Ubuntu in a VM is outperforming native macOS? Does seems like a weird result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants