Data structure discrepancy on deferred attestation #7805

hcnam · 2023-12-08T07:52:29Z

Description

We’ve checked several implementations, but they adopt different data structures for deferring future attestations: Lighthouse and Nimbus are using queue, and Prysm and Teku using map to save early received attestations (Prysm uses non-deterministic map, but Teku uses deterministic map structure).
Due to the above discrepancy, if a slashable pair of attestation (like double voting two different block that shares the same parent block) enter onAttestation, Prysm makes a non-deterministic fork choice result. See scenario below.

Steps to Reproduce

Scenario
Assumption: No attestation except Attestation X and Y, or B and C has same vote. attester_slashing was not received yet.
Steps:

 N      N+1     N+2  
[A]  –  [B]  –  [ ] 
     \  [ ]  –  [C] 
 
Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+3 (attesting block B)
           Receive Attestation_1 from validator X of slot N+3 (attesting block C)
Slot N+3 : Check head

Expect: head is B due to other use queue for deferred attestation (later ignored)
Result: Teku deterministically makes the same decisions while Prysm nondeterministically makes different decisions (Note that the results from Teku and Prysm may be different from Lighthouse and Nimbus).

References:
Lighthouse: queued_attestations
Teku: deferredAttestations, futureAttestations
Prysm: processAttestations, AttCaches
Nimbus: queuedAttestations

The text was updated successfully, but these errors were encountered:

rolfyone · 2023-12-10T23:16:01Z

I'm not sure how to progress this one, Is the assertion here that we're making an incorrect decision? Any future / deferred attestations should be not processed and how they're queue'd shouldn't matter?
Deferred attestations for example are typically deferred because we don't have the block it's referring to iirc, and in that case they can't count towards fork choice...
I'm not sure if I'm missing something, so I've put my line of thinking here in the hope that maybe there's a clarification on what we want out of this issue before I categorise how to proceed...

hcnam · 2023-12-11T05:08:32Z

Before starting, I'm not saying that Teku is wrong. Teku is doing well under the consensus specification.
It's just that the current specification doesn't specify how to defer fork choice when handling with future attestations that vote on a known block (attestation for unknown block will ignored on assertion), which seems to have led to discrepancy between different clients in implementing it.
Therefore, this question relates to a mechanism to defer attestations from the future that target already known blocks. As I remember, in consensus-spec, validate_on_attestation asks to delay consideration for future attestations until their slot is in the past, but not mentioning how.

For example, in the above scenario, validator X is making double-voting (yes, this is slashable) attestation and releasing it before the proper slot. Therefore, both double-voting attestations are considered attestations from the future.
Now, the implementations that are using queue (like Lighthouse or Nimbus) will store those future attestations on queue, and try to handle it when the proper slot has arrived with the arrival order. That is, if the arrival order of attestation is [att_0, att_1] then, the client will process att_0 properly and ignore att_1 because it is duplicated.

However, map -based implementations (like Teku or Prysm) will make different decisions (because, as you know, the map is based on key-value unlike the queue, which doesn't guarantee the order). Therefore, it is not 100% deterministic whether the processing order will be [att_0, att_1] or [att_1, att_0] unlike queue-based implementations.

This is a minor and non-urgent discrepancy. Because, as you know, in the real world this may not be a big problem due to there being more than 900K validators and a lot of slashers are watching it. Also, the order of arrival of the attestations will be unpredictable (except when packet orders are manipulated by a larger-scale network attacker).
But, I want to suggest that the possibility exists that such a minor discrepancy like this may lead to real problems in the future.

michaelsproul · 2023-12-12T04:47:41Z

@hcnam This is an interesting idea! I think your issue probably belongs on the consensus-specs, because as you said the queueing order isn't defined by the specification.

My intuition is that it's kind of OK not to specify the queueing order because attestation arrival is unpredictable anyway. The P2P gossip conditions don't mandate propagation of early attestations, so an attacker could quite easily queue their different attestations on different nodes even without the different queue structures (e.g. gossip just att1 to some nodes, and gossip just att2 to others). When the attestation slot(s) arrive and they are processed, some clients will have fork choice output based off att1 and others will have output based off att2. This is called "view splitting", and in our case should eventually be reconciled by a block that includes one of the attestations, causing the other side to detect the slashing (and broadcast it).

There are some proposals floating around for making fork choice more resilient to these type of split views, e.g. https://ethresear.ch/t/view-merge-as-a-replacement-for-proposer-boost/13739. You might like to get involved and contribute to that effort? IMO it would be great to prove some resilience properties about the combination of the P2P conditions & the fork choice spec.

michaelsproul · 2023-12-12T05:09:35Z

To add some nuance to the above, there are a few complicating factors to the gossip-based view split:

Re-queueing future attestations is optional, and some implementations (including Lighthouse) choose to just drop them: https://github.com/sigp/lighthouse/blob/441fc1691b69f9edc4bbdc6665f3efab16265c9b/beacon_node/network/src/network_beacon_processor/gossip_methods.rs#L1603-L1628.
If we (Lighthouse) were to requeue future attestations, as we do for attestations to unknown blocks, then we would likely delay marking them as IGNOREd on gossipsub until they were re-processed. If they were processed successfully we would ACCEPT them, which would result in them being gossiped to our peers. i.e. the network wouldn't need to wait for the attestation to be included in a block before reconciling the split view.

rolfyone · 2023-12-18T22:00:41Z

If we're moving this to consensus-specs shall I close this issue here? Not much we can really do to progress this directly...

hcnam · 2023-12-19T05:07:40Z

I've moved this issue to consensus-specs/issues/3498

hcnam mentioned this issue Dec 19, 2023

Consider adding more details for the attestation deferring mechanism ethereum/consensus-specs#3498

Open

hcnam closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data structure discrepancy on deferred attestation #7805

Data structure discrepancy on deferred attestation #7805

hcnam commented Dec 8, 2023

rolfyone commented Dec 10, 2023

hcnam commented Dec 11, 2023

michaelsproul commented Dec 12, 2023 •

edited

michaelsproul commented Dec 12, 2023 •

edited

rolfyone commented Dec 18, 2023

hcnam commented Dec 19, 2023

Data structure discrepancy on deferred attestation #7805

Data structure discrepancy on deferred attestation #7805

Comments

hcnam commented Dec 8, 2023

Description

Steps to Reproduce

rolfyone commented Dec 10, 2023

hcnam commented Dec 11, 2023

michaelsproul commented Dec 12, 2023 • edited

michaelsproul commented Dec 12, 2023 • edited

rolfyone commented Dec 18, 2023

hcnam commented Dec 19, 2023

michaelsproul commented Dec 12, 2023 •

edited

michaelsproul commented Dec 12, 2023 •

edited