Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats #513

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ZacAttack
Copy link
Contributor

[VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats

This VIP explores a strategy for removing the offset metadata stored per record in Venice by utilizing replica heartbeats.

>
>HEARTBEAT: {3, 199, 4500}
>
>Record 1: {<3, 200, <4500}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the backfill information like <3, does that mean we can always gurantee between HEARTBEAT and Record 1, there is no update from other regions. Only region #2 with offset 200 mute that key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll include a case here for the two updates as it's an important one. We have two choices, we can either rely on the heartbeat + previous events to build a running highwatermark from which we can apply this backfill, or, we can compact key updates which occur between two heartbeats.

My intuition is that key updates are the way to go because if we apply an increasingly growing highwatermark into the offsets of individual records, then we narrow the common window between two colos. I've got a spec half done which models this, I'll post back here once I've determined conclusively that this intuition is right. But it's a good call out because with two to the same key within the heartbeat interval then backfilling with the less then up to the last heartbeat is an incorrect generalization, and will lead to false positives in some simple cases.


That said, it's not actually a requirement to be able to do this on every single event we consume. It's possible to meet the first two requirements at a courser granularity of updates.

### Heartbeat Algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure my understanding is correct or not, just want to confirm: If we have 3 regions, the Heartbeat algorithm can save RMD space from storing 3 regions's offset to 1 region's offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, but we'll actually store none of the regions offset metadata in rocksdb. We'll only persist it to PubSubBroker as envelope metadata. None of which has to go into rocksdb.

layout: default
title: [VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats
parent: Community Guides
permalink: /docs/proposals/VIP_TEMPLATE.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For generating a reachable URL:

Suggested change
permalink: /docs/proposals/VIP_TEMPLATE.md
permalink: /docs/proposals/vip-2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the Proposals table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants