Improve statement cache perf by using `rustc-hash` instead of the `std` hasher. #234

LouisGariepy · 2023-01-17T22:38:17Z

This is just a POC PR to show how the implementation could be done.

This only affects PostgreSQL, I couldn't find statement caches for the other drivers.

AFAIK, this change is non-breaking and doesn't have any API repercussions. FxHashMap is just a convenience type alias, it uses HashMap under the hood.

I'm not familiar enough with the internals of the crate to know if the current benchmarks take statement cache hashing into account, so it's hard to tell the actual real-use gains. If you think this would be necessary and could provide guidance, I could improve the PR with proper benchmarking.

I have benchmarked the hashing algorithms alone head-to-head, and rustc-hash performs 20% to 50% better than the std algorithm on plausible SQL queries. FxHash is particularly faster on smaller inputs. rust-hash is well-known to be faster, but it doesn't have DoS-safety like the std hasher has. This shouldn't be a problem since usually the app controls which queries are hashed.

Why

Improve statement cache hashing performance without changing the API.

Alternatives

Keep using the std hasher.

Future work

The hashing algorithm is a small change with a small impact, but there are other ways to speed up the statement caches.

I noticed that there's a lot of RwLock and Mutex going on in the internals of the statement caches implementation. I reckon we could make significant gains using some sort of lock-free/concurrent datastructures. Most async projects are multithreaded (e.g. tokio distributes tasks on many cores if possible), so this should provide significant performance gains. I'd like to explore this in a future PR.

bikeshedder · 2023-01-24T17:26:46Z

The hashing algorithm is a small change with a small impact, but there are other ways to speed up the statement caches.

Originally I wanted to create a CachedStatement struct which does create the hash only once (u64) and is const compatible. e.g. const SELECT_FOO: CachedStatement = CachedStatement::new("SELECT * FROM foo");

I'm not sure if it's actually fine to store that u64 internally and call hasher.write_u64(self.hash) inside the Hash implementation though. As the hash is only calculated once per statement it feels wasteful to create yet another hasher and hash that u64 just for the sake of it. Afaik that's the reason for some IntHashMap implementations to exist in the first place.

With that structure it would even be fine to ditch the need for calling prepare explicitly but just make it run on first use. e.g.

const SELECT_FOO: CachedStatement = CachedStatement::new("SELECT * FROM foo");

async fn do_stuff(client: &Client) {
    client.query(SELECT_FOO, &[]).await.unwrap();
    ...
}

After some experimentation with that idea I ditched it however and went with the simpler, yet slightly less performant, way of adding _cached methods and using the default hashing algorithm.

There is lots of potential for bikeshedding here. ❤️ 🤣

I noticed that there's a lot of RwLock and Mutex going on in the internals of the statement caches implementation.

The only reason for the statement cache to live inside a RwLock is that it must be possible to manage the StatementCaches from the via the Manager::statement_caches member. It's the only way to clean the entire cache or remove invividual statements from it for all connections of a given pool.

Improve StatementCache management methods #78

btw. the fastest way of working with prepared statements would be to collect all SQL that's ever going to be prepared in one large array and just use those indexes into a Vec<Option<Statement>>. If Some<Statement> is found it's prepared and ready to go and if it's None it needs to be prepared first.

That however would result in a VERY unergonomic API unless making heavy use of macros, parsing some local SQL files and generating the code necessary.

It's a very tempting idea but would also add an incredible amount of complexity to this crate.

LouisGariepy added 2 commits January 17, 2023 16:58

Add rustc-hash dependency.

afc8f16

Use FxHashMap instead of the std hashmap.

1f24c7b

LouisGariepy mentioned this pull request Jan 17, 2023

Consider using rustc-hash instead of the default std hasher for statement caching. #233

Open

bikeshedder force-pushed the master branch from c56ee5c to d80842e Compare January 24, 2023 16:13

bikeshedder force-pushed the master branch 2 times, most recently from 2f4d3ba to b1cf396 Compare March 31, 2024 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve statement cache perf by using `rustc-hash` instead of the `std` hasher. #234

Improve statement cache perf by using `rustc-hash` instead of the `std` hasher. #234

LouisGariepy commented Jan 17, 2023

bikeshedder commented Jan 24, 2023

Improve statement cache perf by using rustc-hash instead of the std hasher. #234

Are you sure you want to change the base?

Improve statement cache perf by using rustc-hash instead of the std hasher. #234

Conversation

LouisGariepy commented Jan 17, 2023

Why

Alternatives

Future work

bikeshedder commented Jan 24, 2023

Improve statement cache perf by using `rustc-hash` instead of the `std` hasher. #234

Improve statement cache perf by using `rustc-hash` instead of the `std` hasher. #234