Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding more details for the attestation deferring mechanism #3498

Open
hcnam opened this issue Sep 11, 2023 · 1 comment
Open

Consider adding more details for the attestation deferring mechanism #3498

hcnam opened this issue Sep 11, 2023 · 1 comment

Comments

@hcnam
Copy link

hcnam commented Sep 11, 2023

In consensus-spec, validate_on_attestation asks to delay consideration for future attestations until their slot is in the past. However, it seems that the current spec test on fork choice does not handle it, because the spec does not define the deferring mechanism itself.

# Attestations can only affect the fork choice of subsequent slots.
# Delay consideration in the fork choice until their slot is in the past.
    assert get_current_slot(store) >= attestation.data.slot + 1

In short, the current consensus-spec does not define the queueing order. Therefore, data structure discrepancies on deferred attestation have appeared; [Lighthouse, Nimbus, Lodestar] adopt queue, and [Prysm and Teku] use map for handling the future attestation.

This is a minor and non-urgent discrepancy. Because, as you know, in the real world this may not be a big problem due to there being about 900K validators and a lot of slashers are watching it. Also, the order of arrival of the attestations will be unpredictable (except when packet orders are manipulated by a larger-scale network attacker).
But, I want to suggest that the possibility exists that such a minor discrepancy like this may lead to real problems in the future.

However, there are problems to make this test impossible in current spec test implementations.

  • Stateless onAttestation on Prysm
  • Insufficient future attestation handling in Teku spec test
  • Different data structure for deferred attestation

Stateless onAttestation on Prysm

First problem is that onAttestation from Prysm is stateless. That is, Prysm handles future attestation scheduling outside of the onAttestation function. This is acknowledged by the Prysm developers team in the sense that “As a stateless function (onAttestation), this does not hold nor delay attestation based on the spec descriptions.” Therefore, current testing tools on Prysm can not handle future attestations; They just reject it on current fork choice spec test tool. See scenario A below.

Scenario A
Assumption: No attestation except Attestation X and Y, or B and C has same vote.
Steps:

 N      N+1     N+2
[A]  –  [B]  –  [ ] 
     \  [ ]  –  [C] 
 
Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+3 (attesting block B)
Slot N+3 : Check head

Expect: head is B due to votes
Result: Prysm decide with tie-break rule

Insufficient future attestation handling in Teku spec test

The issue stems from incomplete future attestation handling in Teku spec test. In Teku, it has an attestation scheduling so the future slot attestations can be deferred until their slot in the past. However, Teku's spec test environment does not fully support future attestations, i.e., it cannot defer attestations more than one slot. Teku assigns one of the two states for incoming future slot attestation. The first state is DEFER_FOR_FORK_CHOICE, indicating that the attestation should be handled in the next slot. The other state is SAVED_FOR_FUTURE, which is used to save attestations that should be handled after several slots in the future. Teku's fork-choice testing environment supports DEFER_FOR_FORK_CHOICE state attestations but not SAVED_FOR_FUTURE state attestations. This results in test cases with attestations attesting the future after the one slot failing in Teku tester. See scenario B below.

Scenario B
Assumption: No attestation except Attestation X and Y, or B and C has same vote.
Steps:

 N      N+1     N+2     N+3  
[A]  –  [B]  –  [ ]  –  [ ] 
     \  [ ]  –  [C]  –  [ ] 
 
Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+4 (attesting block B)
Slot N+4 : Check head

Expect: head is B due to Teku can handle deferred attestation (later ignored)
Result: Teku decide with tie-break rule

Different data structure for deferred attestation

This problem is data structure discrepancy between multiple clients. We’ve checked the several implementations, but they adopt different data structure for deferring future attestations: Lighthouse and Nimbus are using queue, and Prysm and Teku are using map to save early received attestations.
Moreover, implementation of map data structure also has discrepancy; Prysm uses non-deterministic map, but Teku uses deterministic map structure.
Due to the above discrepancy, if a slashable pair of attestation (like double voting two different block that shares the same parent block) enter onAttestation, Prysm makes a non-deterministic fork choice result. See scenario C below.

Scenario C
Assumption: No attestation except Attestation X and Y, or B and C has same vote. attester_slashing was not received yet.
Steps:

 N      N+1     N+2  
[A]  –  [B]  –  [ ] 
     \  [ ]  –  [C] 
 
Slot N   : Block A
Slot N+1 : Block B (parent: A)
Slot N+2 : Block C (parent also A)
           Receive Attestation_0 from validator X of slot N+3 (attesting block B)
           Receive Attestation_1 from validator X of slot N+3 (attesting block C)
Slot N+3 : Check head

Expect: head is B due to other use queue for deferred attestation (later ignored)
Result: Teku deterministically makes the same decisions while Prysm nondeterministically makes different decisions (Note that the results from Teku and Prysm may be different from Lighthouse and Nimbus).

References:
Lighthouse: queued_attestations
Teku: deferredAttestations, futureAttestations
Prysm: processAttestations, AttCaches
Nimbus: queuedAttestations
Lodestar: queuedAttestations

And related discussions:
Teku#7805 - Data structure discrepancy on deferred attestation
Teku#7804 - fork choice spec test seems not fully handling future attestation
Prysm#12884 - Time moves during fork choice spec test

@potuz
Copy link
Contributor

potuz commented Sep 11, 2023

This is again the same situation as prysmaticlabs/prysm#12884 and if this were raised in our repo, I would be inclined to close for the same reason. Leaking implementation details is one of the worse problems of an executable spec, which leads to a fake client diversity if Prysm (resp. Lighthouse) have Go (resp. Rust) interpretations of the same Python architecture. The current code in prysm deals with attestations pools to retry attestation validation and it's easily tested in unit tests.

However, in this particular scenario, we could handle test vectors where the attestations arrive with a sligh time skew, but most probably would require moving the granularity of the testvectors from seconds to milliseconds and it would force us to move from a localized forkchoice codepath testing to a more abrangent sync package and gossip paths testing. I would be against such a move: forkchoice spectests have been among the most useful tests the spec repo has given us, catching many consensus splitting bugs in our codebase, I feel like changing the scope of these tests would also render them less reliable.

@hcnam hcnam changed the title Consider adding attestation deferring test vectors in the fork-choice spec test Consider adding more details for the attestation deferring mechanism Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@potuz @hcnam and others