Question about LeaseRead #166

boringhello · 2024-02-15T16:41:45Z

In the Raft paper, there is a mentioned optimization method for Lease Read using clock + heartbeat. When the leader sends a heartbeat, it first records a timestamp called "start". If the majority of the nodes in the system respond with heartbeat responses, it is assumed that the leader's lease is valid until the time point "start + election timeout / clock drift bound".
But when I read etcd LeaseRead code, it seems that LeaseRead depend on the checkquorum. When the leader receive the major of heartbeat responses in the half of election_timeout, the MsgCheckQuorum will not step the leader down on this election_timeout round. The next round, the leader step down into follower because the major of nodes is not recentActive.This case seems that the leasetime last 1.5 election_timeout.This confused me.

boringhello · 2024-02-16T04:19:36Z

func TestLeaseRead(t *testing.T) {
    n1 := newTestRaft(1, 10, 1, newTestMemoryStorage(withPeers(1, 2, 3)))
    n2 := newTestRaft(2, 10, 1, newTestMemoryStorage(withPeers(1, 2, 3)))
    n3 := newTestRaft(3, 10, 1, newTestMemoryStorage(withPeers(1, 2, 3)))
    n1.checkQuorum = true
    n2.checkQuorum = true
    n3.checkQuorum = true
    n1.readOnly.option = ReadOnlyLeaseBased
    n2.readOnly.option = ReadOnlyLeaseBased
    n3.readOnly.option = ReadOnlyLeaseBased
    nt := newNetwork(n1, n2, n3)
    nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
    if n1.state != StateLeader {
        t.Errorf("node 1 state: %s, want %s", n1.state, StateLeader)
    }
    if n2.state != StateFollower {
        t.Fatalf("node 2 state: %s, want %s", n2.state, StateFollower)
    }
    if n3.state != StateFollower {
        t.Fatalf("node 3 state: %s, want %s", n3.state, StateFollower)
    }
    nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("foo")}}})
    nt.cut(1, 2)
    nt.cut(1, 3)
    n1.Step(pb.Message{From: 1, Type: pb.MsgReadIndex, Entries: []pb.Entry{{Data: []byte("123")}}})
    msgs := n1.readStates
    if len(msgs) != 1 {
        t.Errorf("len(msgs) = %d, want 1", len(msgs))
    }
    for i := 0; i < 17; i++ {
        n1.tick()
    }
    // wait inLease
    for i := 0; i < n2.electionTimeout; i++ {
        n2.tick()
    }
    for i := 0; i < n3.electionTimeout; i++ {
        n3.tick()
    }
    nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgHup})
    n1.Step(pb.Message{From: 1, Type: pb.MsgReadIndex, Entries: []pb.Entry{{Data: []byte("123")}}})
    msgs = n1.readStates
    if len(msgs) != 2 {
        t.Errorf("len(msgs) = %d, want 2", len(msgs))
    }
    if n2.state == StateLeader {
        t.Errorf("n2 become leader")
    }
    if n3.state == StateLeader {
        t.Errorf("n3 become leader")
    }
}

boringhello · 2024-02-16T15:40:00Z

The test code provided above is a reproduction of the issue I described. However, the observed result where n2 became the leader while n1 still held the lease, allowing for a leaseRead operation. It is also possible that there is an issue with the test I wrote, as I am not familiar with this project. Can someone help me understand this question?

MrDXY · 2024-03-11T08:23:40Z

Hey @boringhello, I found your question very interesting. I'm new to this topic, but I'll share my ideas since I'm also interested in it.
I believe the main issue is having two leaders serving at the same time. If we enable the Leader Lease and a network partition happens, it will take a maximum of two election timeout durations for the leader to realize it has lost the quorum and step down as a follower in the current etcd-raft.

Let's look at it from n1's perspective:

At time1, n1, n2, and n3 are working fine, with n1 as the leader. However, just as n1 receives HeartBeatResponse from n2 and n3, a network partition occurs.
- It takes us the first ElectionTimeout to reach time2.
At time2, n1 marks n2 and n3 as inactive followers and sends them HeartBeat, waiting for HeartBeatResponse to re-mark them as active ones.
- It takes us the second ElectionTimeout to reach time3.
At time3, n1 checks n2 and n3 and finds out that they are still inactive. At this point, n1 realizes that it has lost the quorum.

Between time2 and time3, there is a possibility of encountering two lease-read results. Even if you implement a new Lease Read algorithm that includes a clock drift bound and allows clients to keep track of the latest index corresponding to the results they have seen, the problem will still persist. This approach only optimizes the situation, it doesn't fix it. However, it can reduce the occurrence of the problem.

I agree that having two ElectionTimeouts doesn't make much sense. And there is room for improvement. I'm not sure if my idea is right, but a possible solution is to track the follower's HeartBeatClock in ProgressTracker. The HeartBeatClock increases with the Leader's clock, and when we receive the follower's HeartBeatResponse, we reset the HeartBeatClock to 0. When the HeartBeatClock reaches the ElectionTimeout, we can mark the corresponding RecentActive as false. Before responding to a LeaseRead request, the leader needs to check the QuorumActive first.

boringhello · 2024-03-13T12:48:43Z

Thanks for your idea. But it seemed that if the responce is delayed in the network, it may be wrong. I am not sure why etcd raft desgin like this.Maybe this is a tradeoff.

boringhello · 2024-03-13T12:52:33Z

@pav-kv can you have a look this? It seemed that the LeaseRead's implement have some issues.

pav-kv · 2024-03-13T13:11:23Z

Last time I looked at leases in this repo, they did not appear correct/usable and providing at-most-one guarantee. I don't have time to look at this in detail at the moment.

For leases to work properly, there needs to be some "global" logical time notion shared by all the nodes. There must be a guarantee that any logical timestamp is owned by at most one lease. Ticks are a local heuristic, and ticks are not synchronized across nodes, so it's hard to tell if these heuristics give any guarantees.

boringhello · 2024-03-13T13:20:04Z

You say that Leaseread was abandoned in this project?

pav-kv · 2024-03-13T13:21:10Z

I'm not aware who uses it. In CRDB, we have our own leases on top of raft based on hybrid logical clocks.

boringhello · 2024-03-13T13:26:02Z

So why not remove the LeaseRead feature if they are not useful/correct ?

120613: raft: remove unused read-only requests r=nvanbenschoten a=pav-kv This PR removes the read-only requests from `pkg/raft`. We don't use them in CRDB, and the leases implementation is known to be incorrect (e.g. see etcd-io/raft#166 and etcd-io/etcd#10082). Epic: none Release note: none 121956: sql: add accounting for entries in Txn Fingerprint ID cache r=yuzefovich a=yuzefovich This commit fixes a bug that could previously result in the memory accounting leak that was exposed by 88ebd70. Namely, the problem is that we previously unconditionally grew the memory account in `Add` even though if the ID is already present in the cache, it wouldn't be inserted again. As a result, we'd only shrink the account once but might have grown it any number of times for a particular ID. Now we check whether the ID is present in the cache and only grow the account if not. Epic: None Release note: None 122011: backupccl: skip BackuprestoreJobDescription under stress race r=kev-cao a=msbutler Informs #121927 Release note: none Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com>

boringhello closed this as completed Feb 16, 2024

boringhello reopened this Feb 16, 2024

pav-kv added the question Further information is requested label Feb 16, 2024

boringhello mentioned this issue Mar 13, 2024

lease read safety etcd-io/etcd#10082

Closed

pav-kv mentioned this issue Apr 8, 2024

raft: remove unused read-only requests cockroachdb/cockroach#120613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about LeaseRead #166

Question about LeaseRead #166

boringhello commented Feb 15, 2024 •

edited

boringhello commented Feb 16, 2024 •

edited

boringhello commented Feb 16, 2024

MrDXY commented Mar 11, 2024

boringhello commented Mar 13, 2024 •

edited

boringhello commented Mar 13, 2024

pav-kv commented Mar 13, 2024 •

edited

boringhello commented Mar 13, 2024

pav-kv commented Mar 13, 2024

boringhello commented Mar 13, 2024 •

edited

Question about LeaseRead #166

Question about LeaseRead #166

Comments

boringhello commented Feb 15, 2024 • edited

boringhello commented Feb 16, 2024 • edited

boringhello commented Feb 16, 2024

MrDXY commented Mar 11, 2024

boringhello commented Mar 13, 2024 • edited

boringhello commented Mar 13, 2024

pav-kv commented Mar 13, 2024 • edited

boringhello commented Mar 13, 2024

pav-kv commented Mar 13, 2024

boringhello commented Mar 13, 2024 • edited

boringhello commented Feb 15, 2024 •

edited

boringhello commented Feb 16, 2024 •

edited

boringhello commented Mar 13, 2024 •

edited

pav-kv commented Mar 13, 2024 •

edited

boringhello commented Mar 13, 2024 •

edited