Lagging follower commit after message drops #138

pav-kv · 2024-01-25T22:31:37Z

The test added in #137 demonstrates that a short network blip / message drop can lead to delaying the follower to learn that entries in its log are committed up to HeartbeatInterval + 3/2 * RTT. This is suboptimal, and can be reduced to HeartbeatInterval + 1/2 * RTT.

The reason why it takes an extra roundtrip is that the leader cuts the commit index sent to the follower at Progress.Match, i.e. it never sends a commit index exceeding the known match size of the follower's log. This is to ensure that the follower does not crash at receiving a commit index greater than its log size.

However, the follower can safely process out-of-bounds commit indices by simply (with a caveat: see below) cutting them at the log size on its end. In fact, the follower already does so for the commit indices it receives via MsgApp messages. When sending MsgApp, the leader does not cut the commit index, correspondingly.

Fix

Fixing this is a matter of:

changing one line at the leader send side, and
at the follower receive side, ensuring that the follower's log is a prefix of the leader's.

(2) is true after the follower received at least one successful MsgApp from this leader. From that point, the follower's log is guaranteed to be a prefix of the leader's log, and it is safe to assume that any index <= Commit is committed.

Issue

However, simply doing (1) is unsafe in mixed-version clusters (during a rolling upgrade). If there are followers running old code, they will crash upon seeing a high commit index. To fix the problem completely, there must be a safe migration. Only after all the nodes are running the new code, it is safe to do (1).

Action Items

Merge the check (2) for the follower-side: raft: advance commit index safely #139
Make sure the clusters are running the new code. For example, wait a couple of releases; note in release notes that upgrades should never jump +2 versions.
Merge the part (1) and close this issue: raft: send up-to-date commit index in heartbeats #140

Alternative Fix

The same delay reduction can be achieved if the leader sends an empty MsgApp after the HeartbeatInterval. We already have periodic empty MsgApp sends in a throttled state, although they are currently hooked in to MsgHeartbeatResp messages (to ensure MsgApp flow is conditional to some connectivity with the follower), which in this case will result in the same +1 RTT delay.

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2024-01-30T10:10:02Z

Isn't this check in place to ensure that followers' logs don't regress? Do we have any other safeguards for that? We don't check that during MsgApp because the follower's log hasn't yet reached the commit index, but once a follower's log has reached the commit index (as recorded via Match) it should not regress because it may be a necessary part of the quorum establishing that commit index.

pav-kv · 2024-01-30T10:59:36Z

Isn't this check in place to ensure that followers' logs don't regress?

Yes. Currently, we rely on the fact that the follower's log is in sync with the leader's up to Match, so we can commit up to that. If we send a Commit index > Match, there is a chance that by the time the message arrives the follower either doesn't have this index, or still has some entries from previous terms that are inconsistent with the leader's log.

My proposal is: teach the follower to understand that it matches the leader's log. Then this safety check can be done at the follower end on receiving Commit index, rather than on the leader when sending it. The advantage of this: the follower knowns that it matches the leader 1/2 RTT earlier than the leader, so there is a potential to make follower Commit index progress faster by 1 RTT in some cases.

pav-kv added enhancement New feature or request good first issue Good for newcomers labels Jan 25, 2024

This was referenced Jan 25, 2024

testdata: add a lagging commit test #137

Merged

Plan to release raft 3.6.0 #89

Open

raft: advance commit index safely #139

Draft

pav-kv removed the good first issue Good for newcomers label Jan 26, 2024

pav-kv linked a pull request Jan 26, 2024 that will close this issue

raft: send up-to-date commit index in heartbeats #140

Draft

pav-kv self-assigned this Jan 26, 2024

pav-kv mentioned this issue Feb 13, 2024

raft: consolidate all append message sending #134

Open

This was referenced Apr 10, 2024

raft: make follower commit index advancement safe cockroachdb/cockroach#122100

Open

raft: introduce persistent leader term in raft log cockroachdb/cockroach#122446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lagging follower commit after message drops #138

Lagging follower commit after message drops #138

pav-kv commented Jan 25, 2024 •

edited

erikgrinaker commented Jan 30, 2024 •

edited

pav-kv commented Jan 30, 2024 •

edited

Lagging follower commit after message drops #138

Lagging follower commit after message drops #138

Comments

pav-kv commented Jan 25, 2024 • edited

Fix

Issue

Action Items

Alternative Fix

erikgrinaker commented Jan 30, 2024 • edited

pav-kv commented Jan 30, 2024 • edited

pav-kv commented Jan 25, 2024 •

edited

erikgrinaker commented Jan 30, 2024 •

edited

pav-kv commented Jan 30, 2024 •

edited