Number | Category | Status | Author | Created |
---|---|---|---|---|
0044 |
Standards Track |
Proposal |
Boyu Yang <yangby@cryptape.com> |
2022-08-18 |
This RFC describes a light client protocol on CKB which allows clients to verify that a blockchain is valid with limited resources.
Downloading and verifying all blocks is taking hours and requiring gigabytes of bandwidth and storage. Hence, clients with limited resources cannot verify transactions independently without trusting full nodes. Even some light clients only download all block headers, the storage and bandwidth requirements of those clients still increase linearly with the chain length. We propose a more efficient FlyClient-based light client protocol. It uses a sampling protocol tailored for NC-Max difficulty adjustment algorithm. It requires downloading only a logarithmic number of block headers while storing only a single block header between executions.
A Merkle Mountain Range (MMR) is a binary hash tree that allows for efficient appends of new leaves without changing the value of existing nodes.
In MMR, we use the insertion order to reference leaves and nodes. We insert a new leaf to MMR as following steps:
- Insert leaf or node to next position.
- If the new inserted leaf or node has a left sibling, we merge the left and right nodes to produce a new parent node, then go back to step 1 to insert the node.
For example, we insert a leaf to the example MMR which has 11 leaves as following steps:
- Insert leaf to next position:
19
. - Then check the left sibling
18
and calculate parent node:merge(mmr[18], mmr[19])
. - Insert parent node to position
20
. - Since the node
20
also has a left sibling17
, calculate parent node:merge(mmr[17], mmr[20])
. - Insert new node to next position
21
. - Since the node
21
have no left sibling, complete the insertion.
# An MMR with 11 leaves:
14
/ \
6 13
/ \ / \
2 5 9 12 17
/ \ / \ / \ / \ / \
0 1 3 4 7 8 10 11 15 16 18
# After insertion of a new leaf:
14
/ \
6 13 21
/ \ / \ / \
2 5 9 12 17 20
/ \ / \ / \ / \ / \ / \
0 1 3 4 7 8 10 11 15 16 18 19
An MMR is constructed by one or more sub merkle trees (or mountains). Each sub merkle tree's root is a peak in MMR, we calculate the MMR root by bagging these peaks from left to right.
For example:
- In the above 11 leaf MMR we have 3 peaks:
14, 17, 18
, we bag these peaks from left to right to get the root:merge(merge(mmr[14], mmr[17]), mmr[18])
. - In the above 12 leaf MMR we have 2 peaks:
14, 21
, we bag these peaks from left to right to get the root:merge(mmr[14], mmr[21])
.
The merkle proof is an array of hashes constructed with the following parts:
- A merkle proof from the leaf's sibling to the peak that contains the leaf.
- A hash that bags all right-hand side peaks, skip this part if no right-hand peaks.
- Hashes of all left-hand peaks from right to left, skip this part if no left-hand peaks.
We can reconstruct a merkle root from leaves and the merkle proof for those leaves.
The client uses
In this probability,
Concretely, let
The sampling domain is restricted from
Let
Define
We treat all blocks as the leaf nodes of an MMR, and its block number is the index of leaves in that MMR.
Each MMR node is defined as follows:
-
children_hash
- For a leaf node, it's its header hash.
- For a non-leaf node, it's the hash of the serialized data that concatenate its two children nodes' hashes. A node's hash is the hash of its serialized data.
-
total_difficulty
- For a leaf node, it's the difficulty it took to mine the current block.
- For a non-leaf node, it's the sum of
total_difficulty
in its child nodes.
-
start_*
andend_*
- For a leaf node, both of them are the data of the current block.
- For a non-leaf node:
start_*
is thestart_*
of the leftmost node.end_*
is theend_*
of the rightmost node.
- There are 4 pairs of data:
*_number
means a block number.*_epoch
means an epoch number.*_timestamp
means a block timestamp.*_compact_target
means a block compact target.
An MMR node will represent in Molecule schema as follows:
struct HeaderDigest {
children_hash: Byte32,
total_difficulty: Uint256,
start_number: Uint64,
end_number: Uint64,
start_epoch: Uint64,
end_epoch: Uint64,
start_timestamp: Uint64,
end_timestamp: Uint64,
start_compact_target: Uint32,
end_compact_target: Uint32,
}
A chain root for a block is the merkle root of all blocks on the chain until that block (include itself).
After the epoch which MMR starts to be enabled in, the first 32 bytes of the block extension should be the hash of its parent chain root.
A verifiable header is a header with the fields which are used to do verification for its extra hash [1].
It contains a normal header, its uncles' hash, its block extension and the chain root for its parent block.
All protocol messages will represent in Molecule schema as follows:
union LightClientMessage {
// A client asks the server for the last state of the chain.
GetLastState,
SendLastState,
// A client asks the server for the proof of the last state which the
// client known.
GetLastStateProof,
SendLastStateProof,
// A client asks the server for the proof of some blocks.
GetBlocksProof,
SendBlocksProof,
// A client asks the server for the proof of some transactions.
GetTransactionsProof,
SendTransactionsProof,
}
table GetLastState {
// Whether the server is requested to push the state automatically.
subscribe: Bool,
}
table SendLastState {
// The verifiable header for the tip block in the server.
last_header: VerifiableHeader,
}
table GetLastStateProof {
// The last block hash known by the client.
// It could be different with the tip hash in the server.
last_hash: Byte32,
// The hash of the last proved block.
start_hash: Byte32,
// The block number of the last proved block.
start_number: Uint64,
// How many continuous blocks before the tip block should be included at
// least, if possible?
last_n_blocks: Uint64,
// All blocks, whose total difficulty is not less than this difficulty
// boundary, should be included in the proof.
difficulty_boundary: Uint256,
// The sampled difficulties.
difficulties: Uint256Vec,
}
table SendLastStateProof {
// If the block whose hash is sent from the client is on the chain, then
// returns its verifiable header; otherwise, returns the verifiable
// header for the tip block in the server.
last_header: VerifiableHeader,
// The MMR proof for the chain root whose hash is in the last header.
// Be empty if the block hash sent from the client isn't on the chain.
proof: HeaderDigestVec,
// Verifiable headers for all sampled blocks.
headers: VerifiableHeaderVec,
}
table GetBlocksProof {
// Refer to `GetLastStateProof.last_hash`.
last_hash: Byte32,
// Block hashes for the blocks which require verifying.
block_hashes: Byte32Vec,
}
table SendBlocksProof {
// Refer to `SendLastStateProof.last_header`.
last_header: VerifiableHeader,
// Refer to `SendLastStateProof.proof`.
proof: HeaderDigestVec,
// Block headers for the blocks which require verifying.
headers: HeaderVec,
// Block hashes for the blocks which were not found.
missing_block_hashes: Byte32Vec,
}
table GetTransactionsProof {
// Refer to `GetLastStateProof.last_hash`.
last_hash: Byte32,
// Transaction hashes for the transactions which require verifying.
tx_hashes: Byte32Vec,
}
table SendTransactionsProof {
// Refer to `SendLastStateProof.last_header`.
last_header: VerifiableHeader,
// Refer to `SendLastStateProof.proof`.
proof: HeaderDigestVec,
// A collection of filtered blocks, which include all requested
// transactions, and be verified in the proof.
filtered_blocks: FilteredBlockVec,
// Transaction hashes for the blocks which were not found.
missing_tx_hashes: Byte32Vec,
}
We can divide the typical client-server interaction into 3 phases.
-
At start, the client sends
GetLastState
messages to all discovered servers. -
All servers will reply
SendLastState
which includes their verifiable tip header and the total difficulty of the chain. -
The client will send
GetLastStateProof
to all servers which have replied their last state. TheGetLastStateProof
messages are different for different chain states, they are generated base on the sampling strategy which the light client used.The sampling strategy is not a part of CKB. Each light client could choose their own sampling strategy.
In CKB light client implementation, a sampling strategy based on the FlyClient sampling protocol under variable difficulty is used.
-
The servers which received
GetLastStateProof
should replySendLastStateProof
to the client.If the last block that the client known is in the server's current chain, then the server should return the verifiable header of the last block that the client known, the proof for the sampled blocks and their verifiable headers.
Otherwise, the server should only return its current verifiable tip header.
-
Then the client will choose the best chain from all
SendLastStateProof
messages.
Notice: Some variables in this section are defined in the "Background" section.
The server performs the following steps to decide which block should be included in the proof:
-
Construct a collection of blocks which is denoted as
reorg_n_blocks
.Check if the start block, which is sent from the client, is an ancestor block of the tip block.
-
If true, then left
reorg_n_blocks
to be empty. -
Otherwise:
-
Find the block which is an ancestor block of the tip block and its number is the same as the start block.
-
Add the last
$L$ blocks before the block, which is found in the previous step, intoreorg_n_blocks
.
-
-
-
Construct a collection of blocks which is denoted as
last_n_blocks
.-
Find the first block whose total difficulty is not less than
$D_{\mathrm{boundary}}$ . -
If the number of blocks between the block, which is found in the previous step, and the tip block (excluded), is greater than
$L$ , thenlast_n_blocks
includes all these blocks; otherwise,last_n_blocks
includes the last$L$ blocks before the tip block.
-
-
Construct a collection of blocks which is denoted as
sampled_blocks
.-
Let
$D_{\mathrm{boundary}}^{'}$ denote the total difficulty of the first block inlast_n_blocks
. -
For each difficulty in the sampled difficulties, try to find the first block whose total difficulty is not less than it; if the total difficulty of this block is less than
$D_{\mathrm{boundary}}^{'}$ and it's not a duplicate block, then add it intosampled_blocks
.
The sampled difficulties should be sorted.
-
-
The proof should include all blocks in
reorg_n_blocks
,last_n_blocks
, andsampled_blocks
.And, the set of block headers should be sorted by theirs block numbers.
Notice: Some variables in this section are defined in the "Background" section.
In the official implementation of CKB light client, the sampling strategy is based on the FlyClient sampling protocol under variable difficulty.
At first, the client should have a tip block header which requires verification, and its parent chain root whose hash stored in that block, and the total difficulty until that block (included that block).
Moreover, the client chooses the latest verified block as the start block. If there are no verified blocks, the client should choose the genesis block as the start block.
Then the client performs the following steps to samples blocks:
- Let region
$R_{\mathrm{full}}$ denote the region between the start block and the tip block (excluded). - Calculate
$n$ , which is the number of blocks in the region$R_{\mathrm{full}}$ . - Let
$c = 0.5$ and$L = 100$ , then calculate$k$ with the formula$k = \log_{c}{(\frac{L}{n})}$ . - Compare
$n$ with$L$ :- If
$n \le L$ , skip the sampling. - Otherwise:
- Let
$\lambda = 50$ , then calculate$m$ with the formula$m = \lceil \frac{\lambda}{\log_{\frac{1}{2}} {(1 - \frac{1}{k})}} \rceil$ . - Let
$D_{\mathrm{start}}$ denote the total difficulty of the start block and$D_{\mathrm{end}}$ denote the total difficulty of the tip block. - Calculate the difficulty boundary
$D_{\mathrm{boundary}}$ , which is the start boundary of last$\delta$ fraction region, with formula$D_{\mathrm{boundary}} = D_{\mathrm{start}} + (1 - \delta)(D_{\mathrm{end}} - D_{\mathrm{start}})$ . - Calculate
$\delta = c ^ {k}$ . - Use the PDF
$g(x)$ to random$m - L$ difficulties, which are satisfied$d_{i} \in \left[ D_{\mathrm{start}}, D_{\mathrm{boundary}} \right), i \in \left[ 0, m-L-1 \right]$ as the samples.
- Let
- If
After sampling, the client sends a request to the server, namely, a CKB full node, with following data:
- The blocks hash of the tip block that client knows.
- The blocks hash of the start block.
- The blocks number of the start block.
- The number
$L$ . - The difficulty boundary
$D_{\mathrm{boundary}}$ . - The difficulty samples. If there are no difficulties sampled in the previous step, leave them to be empty.
At last, the server replies the proof of the sampled blocks.
Accordingly, the client can treat that tip block as a valid tip block after checking the returned proof.
After phase 1, the client gets a trusted chain based on a proved tip block.
Then the client can use this phase to check blocks whether they are on that trusted chain or not.
-
At the start, the client sends the
GetBlocksProof
message, which contains the last block hash it knows, and hashes of the blocks which require verifying, to the server which has the best chain. -
Then, the server should reply a
SendBlocksProof
message.If the last block that the client known is in the server's current chain, then the server should return
- the verifiable header of the last block that the client knows,
- a proof for blocks which are from the provided block hashes and in the current chain,
- the headers of those proved blocks,
- hashes of blocks which are from the provided block hashes but not in the current chain.
Otherwise, the server should only return its current verifiable tip header.
After phase 1, the client gets a trusted chain based on a proved tip block.
Then the client can use this phase to check transactions whether they are on that trusted chain or not.
-
At the start, the client sends the
GetTransactionsProof
message, which contains the last block hash it knows, and hashes of the transactions which require verifying, to the server which has the best chain. -
Then, the server should reply a
SendTransactionsProof
message.If the last block that the client known is in the server's current chain, then the server should return
- the verifiable header of the last block that the client knows,
- a proof for blocks which contain valid transactions from the provided transaction hashes,
- the proofs that proving those valid transactions are in those blocks,
- hashes of transactions which are from the provided transaction hashes but not in the current chain.
Otherwise, the server should only return its current verifiable tip header.
To avoid attacks by malicious clients, there are few limitations in server side.
- In
GetLastStateProof
messages, the sum of the size of difficulties and 2 times of thelast_n_blocks
should NOT greater than 1000. - In
GetBlocksProof
messages, the size of block hashes should NOT greater than 1000. - In
GetTransactionsProof
messages, the size of transactions hashes should NOT greater than 1000.
There are also few limitations in messages to improvement performance.
- In
GetLastStateProof
messages, the difficulties should be sorted. - In
SendLastStateProof
messages, the headers should be sorted.
Since CKB doesn't record the hashes of chain roots into headers at the start, so it requires a soft fork to extend the relevant consensus rules.
Hence, this feature should be deployed concurrently with RFC-0043 CKB softfork activation.
In fact, it uses a modified version of RFC-0043 CKB softfork activation, which uses the cellbase witness field instead of the block header version for signaling.
The parameters[2] to activate this feature are:
Parameters | For CKB Testnet | For CKB Mainnet |
---|---|---|
name |
LightClient | LightClient |
bit |
1 | 1 |
start_epoch |
5346 (approx. 2022-10-31 15:00:00 UTC) | 8282 (approx. 2023-09-01 00:00:00 UTC) |
timeout_epoch |
5616 (approx. 2022-12-14 15:00:00 UTC) | 8552 (approx. 2023-10-15 00:00:00 UTC) |
period |
42 | 42 |
threshold |
75% | 80% |
min_activation_epoch |
5676 (approx. 2022-12-24 15:00:00 UTC) | 8648 (approx. 2023-11-01 00:00:00 UTC) |
For CKB Testnet, the deployment did activate at epoch 5711.
Some parameters for CKB mainnet are to be determined.
And we also have to create all MMR nodes for blocks before that epoch to make sure that the indexes of the MMR leaves are equal to the block numbers.
If users have data which generated by the previous version of CKB, they have to do data migration before enable this feature. And once the migration started, the data will no longer be compatible with all older versions of CKB.