Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0.0 rc1 changi: node suddenly invalidates old block on fresh sync #2598

Open
kuegi opened this issue Oct 18, 2023 · 11 comments
Open

4.0.0 rc1 changi: node suddenly invalidates old block on fresh sync #2598

kuegi opened this issue Oct 18, 2023 · 11 comments
Labels

Comments

@kuegi
Copy link
Contributor

kuegi commented Oct 18, 2023

Summary

The node suddenly starts to rollback 5000 blocks and then stops with now invalid tx. so far it happened 3 times: once during normal operation and then everytime I try a fresh sync.
Node syncs until different block height (during normal run it was 1654025 , freshsync 1: 1654683 , freshsync 2: 1651171)
then shows those messages ("heights" and file number differs between the cases)

2023-10-17T10:46:45Z UpdateTip: new best=8762ea92a3db35658fbd1ad2042260a597e98c0dc68b62675ff87fd8272d3d4a height=1654025 version=0x20000000 log2_work=79.144044 tx=6133277 date='2023-10-17T10:44:04Z' progress=1.000000 cache=0.2MiB(1247txo)
2023-10-17T10:46:56Z Pre-allocating up to position 0x4000000 in blk00017.dat
2023-10-17T10:47:12Z Pre-allocating up to position 0x6000000 in blk00017.dat
2023-10-17T10:47:24Z Pre-allocating up to position 0x8000000 in blk00017.dat
2023-10-17T10:47:37Z Leaving block file 17: CBlockFileInfo(blocks=2818, size=133987676, heights=1645507...1654025, time=2023-10-04...2023-10-17)
2023-10-17T10:47:37Z Pre-allocating up to position 0x2000000 in blk00018.dat
2023-10-17T10:47:50Z Pre-allocating up to position 0x4000000 in blk00018.dat
2023-10-17T10:48:06Z Pre-allocating up to position 0x6000000 in blk00018.dat
2023-10-17T10:48:30Z Pre-allocating up to position 0x8000000 in blk00018.dat
2023-10-17T10:48:45Z Leaving block file 18: CBlockFileInfo(blocks=3577, size=134077892, heights=1647115...1650691, time=2023-10-05...2023-10-07)
2023-10-17T10:48:45Z Pre-allocating up to position 0x2000000 in blk00019.dat

Then it rolls back to 1645506 always showing this message:

2023-10-17T22:35:19Z [evm_try_push_tx_in_template] failed, reason : [execute_tx] nonce check failed. Account nonce 3276, signed_tx nonce 3396
2023-10-17T22:35:19Z ERROR: ConnectBlock: ApplyCustomTx on 34d9e6bf73bbe9c429e920acff45938c0bbc4bef5de0392be1f371a1ad7fadfd failed with EvmTx: evm tx failed to queue [execute_tx] nonce check failed. Account nonce 3276, signed_tx nonce 3396


2023-10-17T10:50:20Z InvalidChainFound: invalid block=76bcc0329a449c50a23b542e4ee49c0217ea3537968a009121e371839ac72540  height=1645507  log2_work=79.142479  date=2023-10-04T20:36:28Z
2023-10-17T10:50:20Z InvalidChainFound:  current best=b77d0265607a11c93765f086848f6d3c5b05790d4b3145a2ced649f251c5d43b  height=1645506  log2_work=79.142479  date=2023-10-04T20:33:21Z
2023-10-17T10:50:20Z ERROR: ConnectBlock 76bcc0329a449c50a23b542e4ee49c0217ea3537968a009121e371839ac72540 failed, bad-custom-tx (code 68)
2023-10-17T10:50:20Z InvalidChainFound: invalid block=63175318b584de30a77b98f3a4205c2ea45937301ab68959917b1d6e7c6dc734  height=1645538  log2_work=79.142484  date=2023-10-04T20:52:38Z
2023-10-17T10:50:20Z InvalidChainFound:  current best=b77d0265607a11c93765f086848f6d3c5b05790d4b3145a2ced649f251c5d43b  height=1645506  log2_work=79.142479  date=2023-10-04T20:33:21Z

Steps to Reproduce

  • clean folders, let changi sync
  • 100% reproducable leading to the error

Environment

[Please fill all of the following or NA if not applicable]

  • Node Version: 4.0.0rc1 (v4.0.0.0-HEAD-ed73fc141f-dirty (release build))
  • Block height on bug if applicable: 1654683 down to 1645506
  • TX or TX type on bug if applicable: 34d9e6bf73bbe9c429e920acff45938c0bbc4bef5de0392be1f371a1ad7fadfd
  • OS with version: macOS Sonoma 14.0

I am now syncing to an earlier block and will do a local snapshot for easier reproducability

@kuegi kuegi added the bug label Oct 18, 2023
@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

@kuegi Do you think its possible if i could get the problematic EVM tx hash from you?

@sieniven I am currently syncing again. will update as soon as I have it synced again.

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

@sieniven it seems that the tx really is invalid. currently syncing and I get hash d60b62ee74a38e155b638df2073ea0d6e2844a13a5d467a57d65f335874307cd for block 1645507

the invalid tx seems to be in block 76bcc0329a449c50a23b542e4ee49c0217ea3537968a009121e371839ac72540 (also height 1645507) which is the beta14 chain I believe. So I don't get the invalid block on my disk and can't vmap the txs.

so it seems that the "is invalid" error is correct, question is why its reverting...
maybe it gets the beta14 chain and for some reason gets confused thinking it is valid?

will update when I have new info.

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

Checked the logs. this is the block from beta14 which got invalid in beta15.
I now explicitly invalidated it ( 76bcc0329a449c50a23b542e4ee49c0217ea3537968a009121e371839ac72540 ) on my side (block is not in the chain anyway) during sync. lets see if this resolves it.

@prasannavl
Copy link
Member

Attaching one my fresh sync logs: debug.log

And pulling just the heights: heights.txt

Ref:

cat debug.log  | grep "UpdateTip" | cut -d" " -f5 | cut -d"=" -f2 > heights.txt

@prasannavl
Copy link
Member

  • Don't see any evidence of a rollback. It's also aligned with the seed nodes that were freshly synced.

@kuegi any chance you have specific peers added that's aligning your node a different chain perhaps?

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

@prasannavl your sync looks fine yes. I have no added peers, but since the node was running through all the betas it has a long list. And during sync I often clearbanned to get more peers (cause some updated since my node banned it).
So its likely that I got the chain from some of them.

I am still syncing, but if the explicit invalidation solves it, then I think the issue is that the node does not realize the chain to be invalid from the headers and is therefore likely to try to jump over just to realize that it was actually an invalid fork.
And this big rollback also seems to mess up the EVM state.

Will report when sync is finished.

@prasannavl
Copy link
Member

Thanks @kuegi. That seems like the possible theory. The team is also hunting down another issue related to invalid blocks and indexes. Could be related.

And this big rollback also seems to mess up the EVM state

When you have more data, please do add more on this if you have. Thanks.

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

FYI: with the explicit invalidate, the sync worked nicely now. Will try to do some testing regarding the "mess up of state" on a big rollback.

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

@prasannavl reproducible:

  • clean sync
  • ./defi-cli -changi invalidateblock d60b62ee74a38e155b638df2073ea0d6e2844a13a5d467a57d65f335874307cd
    -> node reverts back more than 5k blocks
  • ./defi-cli -changi reconsiderblock d60b62ee74a38e155b638df2073ea0d6e2844a13a5d467a57d65f335874307cd
    -> leads to error:
    2023-10-18T11:47:53Z ERROR: ConnectBlock: ApplyCustomTx on cd062ac205078cc15810f3e95afba30a66ae7c3635ffab43451c6ddca82e0514 failed with TransferDomainTx: Error bridging DFI: [execute_tx] nonce check failed. Account nonce 499, signed_tx nonce 601

looks like the rollback is messing up the nonce data in the state

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

I tried to reproduce it with other filled blocks, but failed. so its not a general issue with filled blocks or lots of TDs.

@kuegi
Copy link
Contributor Author

kuegi commented Oct 18, 2023

@prasannavl @sieniven : I didn't manage to reproduce it with new blocks, but I found the block where it "breaks" right now:
1649697 11bd2e3096138d1b842f038ce0cc796434f14fa2a2ea33941a6a4cea567ccfe8
if you invalidate this one and try to reconsider it, it fails with ERROR: ConnectBlock: Incorrect EVM block hash in coinbase output
every block below that also fails (when invalidate and reconsider), some with the block hash, some with an invalid nonce.
blocks above this (so 1649698 and above) are fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants