Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) benchmark results #374

Open
zamazan4ik opened this issue Feb 9, 2024 · 4 comments
Open

Profile-Guided Optimization (PGO) benchmark results #374

zamazan4ik opened this issue Feb 9, 2024 · 4 comments
Labels
book Related to the book good first issue Good for newcomers nice to have

Comments

@zamazan4ik
Copy link

Hi!

Yesterday I read a post about Logos (I didn't know about the library before). Since the post states "Ridiculously fast" performance I came up with an idea to try to optimize the library performance with PGO (as I already did for many other applications - all the results are available here). I performed some tests and want to share the results.

Test environment

  • Fedora 39
  • Linux kernel 6.7.3
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.76
  • Logos version: the latest for now from the master branch on commit f6de1d7e1c35eb1453e076495904d35875b1db80
  • Disabled Turbo boost (for more stable results across benchmark runs)

Benchmark

Built-in benchmarks are invoked with cargo bench --workspace --all-features. PGO instrumentation phase on benchmarks is done with cargo pgo bench -- --workspace --all-features. PGO optimization phase is done with cargo pgo optimize bench -- --workspace --all-features.

All PGO optimization steps are done with cargo-pgo tool.

Results

I got the following results:

At least in the provided by the project benchmarks, I see measurable performance improvements. I don't know how these benchmarks are helpful for real-life performance evaluation - I just believe the project maintainers in this case.

Possible further steps

I can suggest the following things to consider:

  • Perform more PGO benchmarks in other scenarios. If it shows improvements - add a note to the documentation about possible improvements in the tracing library performance with PGO (I guess somewhere in the README file will be enough).

I will be happy to answer all your questions about PGO.

@jeertmans
Copy link
Collaborator

Hey @zamazan4ik, thank you for your message and comprehensive analysis!

I am new to PGO, but I guess this only optimizes binaries, not library code?
How does it provide any meaningful information to improve the code?

I am asking that since Logos is a library, and PGO optimisation will likely be applied by library users, not us.

@zamazan4ik
Copy link
Author

I am new to PGO, but I guess this only optimizes binaries, not library code?

Actually no - PGO works in the same way for binaries and library code. You can easily apply PGO for building a library (static/dynamic, it doesn't matter) even if you build the library separately from a binary. E.g. check the pydantic-core library and the corresponding PR: pydantic/pydantic-core#741

How does it provide any meaningful information to improve the code?

PGO usually allows the compiler to make much more clever inlining decisions. So in theory you can compare two logos versions (without PGO and with PGO), try to figure out why PGOed version is faster, and then using these insights try to optimize the library code. In this case, you will get the performance boost without needing to integrate PGO into the build pipeline.

However, this way can be quite difficult to implement (because a lot of code needs to be analyzed). Since Logos is the library and you don't prepare any prebuilt binaries here - I can suggest at least writing somewhere in the documentation a note about using PGO for improving Logos performance. So Logos users will be aware of another additional way, how they can speed up their Logos-based applications.

@jeertmans
Copy link
Collaborator

Ok I got it, thanks! Generating PGO binaries seems a bit convoluted, but a tutorial might be interesting, especially if you notice improvements on examples like the JSON parser :-)

Actually, your link to pydantic-core's PGO process interested me a lot, but for another project ^^'

@jeertmans jeertmans added good first issue Good for newcomers book Related to the book labels Feb 13, 2024
@jeertmans
Copy link
Collaborator

Labelling this as a good first issue, for the handbook.

As discussed above, this would be nice to conduct a small analysis of PGO optimization on the JSON parser example, compare performances, and document that in the book.

PGO optimization is quite well documented here: https://doc.rust-lang.org/rustc/profile-guided-optimization.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
book Related to the book good first issue Good for newcomers nice to have
Projects
None yet
Development

No branches or pull requests

2 participants