Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune messages after merge #1989

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

loriopatrick
Copy link

@loriopatrick loriopatrick commented May 7, 2024

Updates isPrunable to be getPruneAction which lets caller know if
message should be pruned (original function) or would cause a prune
action (added functionality). When applicable we prune for the given
FID after the message is merged.

Prune job is still running in the background but can be removed
in the future once it's confirmed there's no stragglers.

Benchmarking with production data, this change seems to add a 5%
overhead on average. The FID lock added to Store::prune_messages
is required for performance reasons. Without the lock, the overhead
is run dependent and takes 100%-500% longer.


Motivation

Currently hubs are seeing regular latency spikes due the periodic pruning job. This update causes the hub to immediately prune messages for a given FID after the message is merged.

Change Summary

Updates isPrunable to be getPruneAction which lets caller know if message should be pruned (original function) or would cause a prune action (added functionality). When applicable we prune for the given FID after the message is merged.

Merge Checklist

Choose all relevant options below by adding an x now or at any time before submitting for review

Additional Context

Adds a FID based lock to Store::prune_messages. This might be an issue for the prune job which are left running. If a prune job is taking a long time it may block merges. I suspect this won't be too different from what's happening today during prunes as the threadpool is limited to 4 threads, so a slow prune would already be blocking things.


PR-Codex overview

This PR focuses on enhancing message merge operations in the store.rs file and adding a PruneAction enum in storeEventHandler.ts.

Detailed summary

  • Improved atomic merge operations with locks for performance
  • Added PruneAction enum to determine pruning actions
  • Updated functions for better message pruning and handling

The following files were skipped due to too many changes: apps/hubble/src/storage/stores/reactionStore.test.ts, apps/hubble/src/storage/stores/rustStoreBase.ts, apps/hubble/src/storage/stores/castStore.test.ts

✨ Ask PR-Codex anything about this PR by commenting with /codex {your question}

Copy link

changeset-bot bot commented May 7, 2024

🦋 Changeset detected

Latest commit: f5d1705

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@farcaster/hubble Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented May 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
hub-monorepo ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2024 10:11pm

@@ -803,7 +803,7 @@ impl Store {
ts_hash: &[u8; TS_HASH_LENGTH],
message: &Message,
) -> Result<Vec<u8>, HubError> {
// If the store supports compact state messages, we don't merge messages that don't exist in the compact state
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment was wrong, or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should say: "we don't merge messages older than the compact state message and don't exist within it". Compact state message is like a checkpoint, you cannot go back and add messages that weren't present at the time of compaction.

Copy link

codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 90.47619% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 48.98%. Comparing base (1880070) to head (4c7f1dc).
Report is 153 commits behind head on main.

❗ Current head 4c7f1dc differs from pull request most recent head f5d1705. Consider uploading reports for the commit f5d1705 to get more accurate results

Files Patch % Lines
apps/hubble/src/storage/stores/rustStoreBase.ts 86.66% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1989       +/-   ##
===========================================
- Coverage   74.05%   48.98%   -25.08%     
===========================================
  Files          99      118       +19     
  Lines        9425    15653     +6228     
  Branches     2097     4885     +2788     
===========================================
+ Hits         6980     7667      +687     
- Misses       2327     7985     +5658     
+ Partials      118        1      -117     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@loriopatrick loriopatrick changed the title [WIP] Prune messages after merge Prune messages after merge May 7, 2024
@@ -117,19 +123,26 @@ export abstract class RustStoreBase<TAdd extends Message, TRemove extends Messag
}
}

for (const [fid, _] of fidToPrune) {
const pruneResult = await this.pruneMessages(fid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works, but I'm curious if it slows down the merge code path.

I suspect this is somewhat inefficient, but I don't know for sure.

e.g. current flow is:

  • check if is prunable in js (gets the storage count and earliest ts hash)
  • merge in rust
  • call the prune logic from js, which again gets the storage count etc and passes it to rust which iterates the db to find the earliest hash and prunes it.

But isPrunable already knows the storage units, message count and the earliestTsHash. One approach could be the pass in the earliestTsHash if it would cause a prune, and have the store prune the earliest message within the same transaction as the merge. Then we don't have to go back and forth between js and rust and we save some db calls.

But if you think perf impact is not that bad, then this is definitely simpler code wise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update my benchmarking branch to see how much overhead the pruneMessage call adds to try and get some real numbers.

Reading getMessageCount, it seems like the value is cached so the second load shouldn't be too bad.

One approach could be the pass in the earliestTsHash if it would cause a prune, and have the store prune the earliest message within the same transaction as the merge.

I like the idea of that. When you add a single message it can optionally remove the earliest stored item in the same operation. Is the earliestTsHash for a particular FID & store cached somewhere on the javascript side? If not, we'd probably still need to hit the db again. It would also change mergeMessages a bit as you'd need to do a remove for each merge rather than for each unique FID at the end (probably doesn't matter that much).


Here's an alternative design that would probably help with perf. We keep the prune job but mergeMessages simply helps the job track which FIDs on which stores need pruning. The job runs more frequently and only targets what needs to be updated. That would avoid the extra logic on the hot path and make the prune job run much faster. Thoughts? Might just be worth making that change as the design is still fairly simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, isPrunable fetches the earliest tsHash from cache. I think a design where we can perform the prune at the same time as the merge so we don't even need the job would be better.

If that is too complicated, I prefer the current approach of doing the prune within the merge call but outside the tx. I don't think we should do the third of just having the job run more frequently. I think we should eventually get rid of the job entirely.

The first approach above is more effort, so maybe let's decide based on the benchmark results?

@loriopatrick loriopatrick changed the title Prune messages after merge [WIP] Prune messages after merge May 8, 2024
@@ -615,14 +636,19 @@ impl Store {
Ok(())
}

pub fn merge(&self, message: &Message) -> Result<Vec<u8>, HubError> {
// Grab a merge lock. The typescript code does this by individual fid, but we don't have a
pub fn merge(&self, message: &Message, max_mesages_for_fid: u32) -> Result<Vec<u8>, HubError> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New prune logic is going here.

todo: move argument max_mesages_for_fid to FidInfoCache

Updates isPrunable to be getPruneAction which lets caller know if
message should be pruned (original function) or would cause a prune
action (added functionality). When applicable we prune for the given
FID after the message is merged.

Prune job is still running in the background but can be removed
in the future once it's confirmed there's no stragglers.

Benchmarking with production data, this change seems to add a 5%
overhead on average. The FID lock added to Store::prune_messages
is required for performance reasons. Without the lock, the overhead
is run dependent and takes 100%-500% longer.
@loriopatrick loriopatrick changed the title [WIP] Prune messages after merge Prune messages after merge May 13, 2024
Copy link
Contributor

@sanjayprabhu sanjayprabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I would just update the tests to cover both the mergeMessages and mergeMessage codepath. Don't have to extensive, just one of the tests using the mergeMessages should be good.

for (const result of mergeResults.values()) {
expect(result.isOk()).toBeTruthy();
}
test("earlier remove messages gets pruned", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems duplicated? One of these should test the mergeMessages (bundles) code path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. A lot of the tests are similar across stores, I must have copied / migrated the test over incorrectly. Will fix 👍

apps/hubble/src/storage/stores/castStore.test.ts Outdated Show resolved Hide resolved
apps/hubble/src/addon/src/store/store.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
javascript Pull requests that update Javascript code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants