Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXP] perf(decode): approx. 100x speed improvement w/ various optimizations #310

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

sfyll
Copy link

@sfyll sfyll commented Jan 26, 2024

Motivation

Faced with millions of transactions, Heimdall wasn't nearly fast enough, decoding calldata at a rate of approx. 10,000 txs per min, with a cap at length 15,000 bytes (I know gas or gas-normalized metrics would be more accurate, sorry!).

As such, after profiling both the CPU and IO components, I've picked up the low-hanging fruits to make Heimdall more than just a single-run tool.

Please note, I have added "Mock" in the title because while this works well, it would require some cleaning to be accepted as a legitimate PR. So, see this as some form of "heads-up".

Solution

The solution consisted of three main components:

  • Pass around a real cache of function signatures and their respective ResolvedFunction arrays. Indeed, what's defined as cache in the codebase would be better called speculative cache as it is implicitly assumed that once loaded the OS will keep that in memory. Nonetheless, it is good practice not to rely on OS abstraction for this very speculative nature. As such, we load the whole key:value mapping and allow it to be passed around threads.
  • Lazy-loading a single HTTP Client that can be passed around threads, since the previous implementation would spin up a client per request. That would result in significant CPU cycles being spent re-doing cryptographic functions of the TLS protocol (notably, SSL Handshake).
  • Added optionality to avoid expensive compute such as similarity checks via normalized_damerau_levenshtein. I'd argue that more formal verification is required to understand how much of an improvement these bring versus the 4-5x added latency once the above two are implemented.

Obviously, you'll notice that cache update can be done better, can be flushed at the end of the process into the /cache directory, etc. I am happy with these limitations, as my goal is met here. Ultimately, these elements justify making this PR "mock", and subject to relatively mild, but yet sensible final polishing work.

I've saved some flamegraph and other CPU/IO profiling done along the way (even though I'd argue most of the performance was picked up via flamegraph, profiling on MacOS requires more upfront work than on Linux...). If you want me to pass these around, feel free to ask!

@Jon-Becker
Copy link
Owner

Jon-Becker commented Jan 26, 2024

Hey! Thank you for opening! I'll take a look today :)

If you wouldnt mind sharing profiling results & flamegraphs, you can comment them here or send to jonathan@jbecker.dev!

@Jon-Becker Jon-Becker changed the title feat(performance improvement): Mock Update with approx. 100x speed improvement [EXP] perf(decode): approx. 100x speed improvement w/ various optimizations Jan 26, 2024
@sfyll
Copy link
Author

sfyll commented Jan 27, 2024

Hey Jon!

I'll enclose a flamegraph I made before making changes to Heimdall, and one from the latest iteration (with damerau toggled-off). Please note my code is using both parallelism and concurrency, resolving transactions input in batches of 10,000. You can easily run your own flamegraph using https://github.com/flamegraph-rs/flamegraph, or profile your app using https://github.com/cmyr/cargo-instruments if on MacOS (can't share these as they have sensitive information)

flamegraph
flamegraphcachehttp

@Jon-Becker
Copy link
Owner

I've removed the normalized_damerau_levenshtein checks from the decode module on nightly, with the previous fixes & improvements it became obselete and unnecessary!

I'll take a look at implementing more optimizations from this PR shortly <3

@sfyll
Copy link
Author

sfyll commented Feb 11, 2024

any questions just shoot

thanks a lot for having created heimdall-rs in any cases ! 🫡

@github-staff github-staff deleted a comment from carlosfgti May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants