Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CsrParAssembler::assemble_pattern significantly slower than CsrAssembler::assemble_pattern on a single thread #58

Open
Andlon opened this issue Feb 24, 2023 · 1 comment

Comments

@Andlon
Copy link
Member

Andlon commented Feb 24, 2023

Benchmarks showed ~30-70% overhead for the parallel variant with RAYON_NUM_THREADS=1. The discrepancy seems to be primarily related to rayon, since some preliminary investigation showed that replacing e.g. into_par_iter with into_iter accounts for most of the overhead. Further overhead could be removed by using atomic locks (though this requires more thought for efficiently handling the multi-threaded case).

@Andlon
Copy link
Member Author

Andlon commented Feb 24, 2023

Update: Chucking all the code into a rayon::scope(|_| {} closure seems to remove a significant part of the overhead (but not all). This suggests that the switch between main thread and the rayon thread for the iterator might be part of the culprit, perhaps because the cache of the rayon thread will be "cold" compared to using the main thread all the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant