pinning: Notification block pinning limit reached for warp-sync #4389

lexnv · 2024-05-06T13:15:32Z

The following warning is continuously printed for a warp-sync kusama node (running litep2p backend) at around ~20 seconds intervals.

This warning did not reproduce however for a full-syncing node.

This was discovered during a long-running node triage (around 2 weeks running litep2p backend).

It seems that during the warp-sync the pinning limit is reached, then the pinning always keeps around 1k pinned blocks. When a new block is discovered, the LRU block is unpinned causing an warning.
It might be the case that references to pinned blocks are kept around for too long (maybe because we have too many blocks).

2024-05-06 13:06:12.494  WARN tokio-runtime-worker db::notification_pinning: Notification block pinning limit reached. Unpinning block with hash = 0x354f531933d7a20b92e652ad0ec2034a74f645923f13ce2651056b2a951a7692

Warning is coming from:

polkadot-sdk/substrate/client/service/src/client/notification_pinning.rs

Lines 96 to 103 in 0fedb4d

    
           if *references > 0 { 
        
           	log::warn!( 
        
           		target: LOG_TARGET, 
        
           		"Notification block pinning limit reached. Unpinning block with hash = {key:?}" 
        
           	); 
        
           	if let Some(backend) = self.backend.upgrade() { 
        
           		(0..*references).for_each(|_| backend.unpin_block(*key)); 
        
           	}

cc @paritytech/sdk-node @paritytech/networking

The text was updated successfully, but these errors were encountered:

skunert · 2024-05-07T09:04:54Z

I will warp sync a new kusama node on a dev machine and see if I can find anything fishy.

skunert · 2024-05-13T08:31:42Z

I was able to reproduce this. As a quick recap, we do not prune blocks that have FinalityNotifications or ImportNotifications floating around. Only when the UnpinHandle is dropped, the block can be pruned in the backend.

It looks like during initialization, the beefy worker is expecting a given header ancestry to be available and is basically waiting for headers to be available here:

polkadot-sdk/substrate/client/consensus/beefy/src/lib.rs

Lines 640 to 646 in efc2132

    
           				info!( 
        
           					target: LOG_TARGET, 
        
           					"🥩 Parent of header number {} not found. \ 
        
           					BEEFY gadget waiting for header sync to finish ...", 
        
           					current.number() 
        
           				); 
        
           				tokio::time::sleep(delay).await;

In my test example it took 6 hours for this loop to continue. But in the meantime we have a finality stream that is not getting polled here:

polkadot-sdk/substrate/client/consensus/beefy/src/lib.rs

Line 524 in efc2132

let mut finality_notifications = client.finality_notification_stream().fuse();

So there, the notifications pile up and the pinning cache limit is reached.

I am not too familiar with the beefy logic, but maybe we could drain the stream while working for the headers or, if we need to act on every notification, we could map them to a different type that does not include the handle.

@acatangiu What do you think?

bkchr · 2024-05-13T09:23:29Z

As I already proposed here: #3945 (comment) the beefy networking should not be registered before beefy is active.

bkchr · 2024-05-13T09:25:37Z

(Basically this entire code should not do anything until BEEFY is ready.)

acatangiu · 2024-05-13T10:11:13Z

I believe this is a different case that would still be a problem even if BEEFY registers networking only after being active.

In case of warp sync, only mandatory headers get synced, and chain "normal operation" starts while rest of the headers sync in the background.

In either case, BEEFY worker needs to register for finality notifications because it is exclusively driven by GRANDPA finality. Simplified initialization steps:

It checks for each finalized block if BEEFY protocol was initialized on chain (beefy_genesis set),
Block N is finalized where BEEFY was initialized: beefy_genesis <= N,
1. if beefy_genesis == N it's simple, just get initial BEEFY validators from the header,
2. beefy_genesis < N, we need to get initial validators from beefy_genesis header - BUT when warp-syncing, this header is not yet available (coming in later), so without BEEFY: add BEEFY-specific warp proofs so we can warp sync when using BEEFY #1118 the workaround was to "just wait" for the sync to complete then resume processing GRANDPA finality stream (this comment explanation). We can't continue processing without the initial validator set, we also can't drop finality notifications without processing them.

Therefore, not helped by #3945 (comment)

acatangiu · 2024-05-13T10:15:04Z

I am not too familiar with the beefy logic, but maybe we could drain the stream while working for the headers or, if we need to act on every notification, we could map them to a different type that does not include the handle.

Yes, either drain the notifications and map/enqueue just the relevant data or just full header to some BEEFY worker queue - once the sync finished, worker needs to process the queue then switch back to processing notifications once it caught up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pinning: Notification block pinning limit reached for warp-sync #4389

pinning: Notification block pinning limit reached for warp-sync #4389

lexnv commented May 6, 2024

skunert commented May 7, 2024

skunert commented May 13, 2024

bkchr commented May 13, 2024

bkchr commented May 13, 2024

acatangiu commented May 13, 2024

acatangiu commented May 13, 2024

pinning: Notification block pinning limit reached for warp-sync #4389

pinning: Notification block pinning limit reached for warp-sync #4389

Comments

lexnv commented May 6, 2024

skunert commented May 7, 2024

skunert commented May 13, 2024

bkchr commented May 13, 2024

bkchr commented May 13, 2024

acatangiu commented May 13, 2024

acatangiu commented May 13, 2024