Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider Blob Sidecar slashing conditions #3527

Open
djrtwo opened this issue Oct 20, 2023 · 11 comments
Open

Consider Blob Sidecar slashing conditions #3527

djrtwo opened this issue Oct 20, 2023 · 11 comments
Labels
general:enhancement New feature or request

Comments

@djrtwo
Copy link
Contributor

djrtwo commented Oct 20, 2023

4844 is being shipped in Deneb without slashing conditions on the p2p BlobSidecar message.

peer-scoring cannot be performed in many cases (might/usually receive sidecar before block from that peer), but by and large, the effect of sending conflicting and invalid sidecars is that the proposer increases their risk of being orphaned. This, in most cases, is a distinct disincentive to spam the network. The primary reason one might leverage additional/invalid messages here is to either create and exploit temporary view splits OR to compound such messages with additional bugs to wreak havic (e.g. if there is a caching issue that creates and irreparable chain split).

Given these messages are not a part of the consensus and due to the impact of adding slashing conditions, it was decided to not incorporate any such slashable offenses in Deneb, but the conversation around if these should be added later continues to bubble up.

There are two types of potential offenses/slashings:

  1. Creating duplicate side-cars for a given slot/index. This is the most obvious anti-spam prevention and would lead to easy rules to follow and easy conditions to validate on-chain
  2. Creating invalid side-cars -- e.g. sidecars for indices beyond the amount of blobs in the block. This is a much more complex type of condition to encode in your anti-slashing DBs as well as to verify on chain.

The complexity and risks around (2) seem not a worthwhile path to go down.

I am personally open to the discussion of (1), but by default, I am resistant to getting into VC anti-slashing DBs and coding this into the core consensus just due to the complexity, and as @dankrad noted, the additional risks it adds to running a validator and thus continue to disincentivize less professional players from validating.

@potuz
Copy link
Contributor

potuz commented Oct 20, 2023

I am in Dankrad's camp in that adding slashing conditions is typically a heavy burden and should not be done lightly.

One thing I'd like to explicitly mention in this topic is that this issue forces clients to actively remove equivocating (or invalid) blobs quickly from their caches or temporary storage. Otherwise it may become exploitable as a way of getting free DA. I suspect all clients anyway will implement this pattern (prysm currently does not). But if this becomes a hassle I'd be more open to consider at least 2 a slashable offense

@dankrad
Copy link
Contributor

dankrad commented Oct 20, 2023

One thing I'd like to explicitly mention in this topic is that this issue forces clients to actively remove equivocating (or invalid) blobs quickly from their caches or temporary storage. Otherwise it may become exploitable as a way of getting free DA.

You do not get free DA form this. In order to get data availability you need to prove in consensus that all nodes had to get the data to accept the block. In this case, the conflicting blob had no influence on the block getting accepted, so there is no free DA.

@hwwhww hwwhww added the general:enhancement New feature or request label Oct 21, 2023
@ppopth
Copy link
Member

ppopth commented Oct 23, 2023

Creating duplicate side-cars for a given slot/index. This is the most obvious anti-spam prevention and would lead to easy rules to follow and easy conditions to validate on-chain

This sounds impossible to me.

From what I see in the fields of BlobSidecar, you cannot create two valid side-cars for a given slot/index, because every field in the side-car is already bound by the block (for example, the blob field is bound by the kzg commitment inserted in the block).

The only case that you can create two different valid side-cars is that the proposer proposes two blocks, in which case, the proposer will be slashed anyway.

@potuz
Copy link
Contributor

potuz commented Oct 23, 2023

This sounds impossible to me.

The point is that the blob is validated independent of the block: from the point of view of gossip validation the blobs are perfectly valid, and only when you get the block you realize that they are invalid.

@djrtwo
Copy link
Contributor Author

djrtwo commented Oct 23, 2023

This sounds impossible to me.

sorry if my phrasing it unclear. I was saying "A slashing condition for the following: creating duplicate side-cars (thus one has to be invalid wrt a single block)"

So the point is as a slashing condition you just need two different side-cars for the same slot/index signed by the same proposer. It doesn't matter which is invalid wrt a given chain

@realbigsean
Copy link
Contributor

My initial thought is it's not worth adding a slashing condition. It might be better to outline RPC caching logic in the spec, since it has to be different from our gossip caching which currently (and I think correctly) ignores duplicates completely. The fact that gossip vs RPC caching have to be different tripped us up previously in lighthouse. Something like this:

  1. If you've seen the block
    • serve only blobs with matching kzg commitments in blob by roots responses
  2. If you haven't seen the block
    • serve the first valid blobs you see in blobs by root
    • when the block arrives, evict any blobs without match kzg commitments, start requesting blobs by root, retrying until you get the blobs you know to be correct. Cache and serve these correct blobs

Messing up this caching can split consensus (if it keeps you from importing a block) but IMO it's still simpler to get right than adding new slashing conditions. And like @ppopth is alluding to, the "right" blob is determinable so it's fundamentally different from duplicate blocks.

I will say though, this logic assumes a block that's accepted as canonical had its blobs propagated on gossip widely enough that they can be found in req/resp relatively quickly. But proposer boost actually weakens this, you just need to propagate the valid blobs to the next proposer plus a much smaller portion of the network. So in the worst case the majority nodes will need the blobs in req/resp and won't have it.

@mkalinin
Copy link
Collaborator

So in the worst case the majority nodes will need the blobs in req/resp and won't have it.

Won’t nodes request block and correct blobs upon receiving attestation voting for the block or a child block built atop of it? So the inconsistency will likely be resolved in a sub slot amount of time.

@realbigsean
Copy link
Contributor

Won’t nodes request block and correct blobs upon receiving attestation voting for the block or a child block built atop of it? So the inconsistency will likely be resolved in a sub slot amount of time.

Yes, nodes would request from attesters as they see them. This just usually ends up in a missed attestation for the node, and if only a minority of nodes have the correct blobs, they're responsible for an outsized amount of uploading happening at once

@potuz
Copy link
Contributor

potuz commented Oct 27, 2023

they're responsible for an outsized amount of uploading happening at once

I think this is a real problem. We lose an invariant that we have today. Currently a valid block that is received by any validator is properly synced and committed to DB. With the current gossip conditions we lose this property: a perfectly valid block (albeit from a malicious or buggy proposer) may be rejected by honest validators. The problem here is that if the next proposer actually received the right blobs and did sync the block on time, every validator that had rejected this block will trigger an RPC request.

  • This is an entirely new P2P phenomenon to which we have no data as far as I know gathered. And it is fairly complicated to gather such data in a devnet environment.
  • This makes the situation for the next proposer quite precarious as well, his block, no matter how timely, will take much longer to be synced by the part of the network that did receive the bad blobs first. This makes it likely to lose proposer boost and therefore be subject a depth 3 reorg if the next proposer is malicious.

The fact that a proposer can cause these splits without penalization (and slashing doesn't seem to completely solve this issue) makes me think that binding blobs to blocks should be taken seriously. There were mostly some concerns about bandwidth that led to unbundle them, however I think these are serious forkchoice issues that we can immediately solve by tying them, at the same time, code complexity on clients (something we hadn't really gauged when we decided to unbundle them) is now clear that becomes simpler with many less edge cases and caches to keep track of equivocating or spurious blobs. It does look like a large refactor, but mostly for a much simpler, less bug prone codebase.

@arnetheduck
Copy link
Contributor

arnetheduck commented Oct 27, 2023

The fact that a proposer can cause these splits

Original discussion about the possibility to split the network: #3244 (comment)

A core point to remember here is that we're talking about colluding dishonest successive proposers, ie those that have private connections between each other - otherwise, with a correctly implemented cache, the correct blobs will eventually reach all participants on gossipsub.

they're responsible for an outsized amount of uploading happening at once

not quite - you cannot forward an attestation for a block that you haven't fully observed and validated, so either you don't forward the attestation or you forward it to 8 peers at most (on average) - this means that the rpc access pattern will broadly follow attestation propagation patterns. If clients "hold" the attestation in a quarantine/short queue and only rebroadcast it once they've observed the valid blobs, we get an even network usage balance which in general uses less bandwidth for spreading the correct blobs than than gossipsub in general (because no duplicates will be sent). Of course, this requires said quarantine to be implemented in clients, which today is not required (it's recommended though to deal with block vs attestation race conditions).

cache

caches are indeed the most tricky part here - in particular, we don't specify well in the spec what to do with IGNORE stuff once we determine its status (VALID or REJECT) - the correct thing to do at that point (for any message, not just blobs) is to clear IGNORE from the libp2p cache and rebroadcast it (to peers that haven't observed the correct version yet), but this needs to be written down and ideally tested.

Taking a step back, IGNORE is used for any condition whose adherence to the protocol cannot be determined at that time - for example an attestation whose block we haven't seen, a block whose parent we haven't seen or indeed a blob whose block we haven't seen.

All these situations eventually resolve themselves and an efficient client must keep track of that resolution in order not to spam the network but also in order for the network to efficiently reach a steady valid state - for example, if you haven't observed a block and later observe it and determine it's invalid, you need to store this "invalid" state in a semi-permanent store (ie at least in memory until shutdown with some high upper bound on number of messages you keep track of) because the libp2p message cache is not sufficient to protect against the same message spreading again on the network, specially during non-finality (or indeed from the client shooting itself in the bandwidth foot and re-requesting it as it chases some weird block history).

Thus, the important point to remember here is that the majority of clients operating on the network already need to be correct and .. "minimally efficient" - ie that they are bug-free in critical places and have well-implemented caches - but this is not a new assumption - an inefficient implementation of the spec (hello, quadratic scans of the validator set in the state transition), if it were in majority, can still take down the network and render it unstable, slashing conditions or no (similar to how slow a slow EL block can throw off the network as clients take time to validate / execute them).

My original proposal for splitting blobs included the following condition:

-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple.

It was deemed unnecessary to specify because of the second half being difficult to analyze, but the first half still remains necessary for efficient blob propagation (ie it's part of the "minimally efficient" implicit stuff that unfortunately is not specified but still needs to happen).

I'm happy to readd it (ie If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate.) as a spec condition if it helps implementers.

@arnetheduck
Copy link
Contributor

Relevant for this discussion also: #3257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants