Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(indexer): Imorove error handling, add IndexerError, allow to ignore missing local delayed receipt (for mirroring, forknet) #11315

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

khorolets
Copy link
Member

Resolves #11313 (see ignore_missing_local_delayed_receipt in the Breaking Changes section)

  • Propagate IndexerError from Indexer::start method to allow better error handling in the client code
  • Add the way to explicitly create tokio::sync::mpsc::channel to allow more control over the indexer job with new method Indexer::streamer_channel()
  • Add async method Indexer::start_streamer(sender: Sender<StreamerMessage>) to allow more control over the indexer job

You can use the new Indexer::start_streamer method to start the indexer job with a custom tokio::sync::mpsc::channel sender. This method is async and returns a Result<(), IndexerError>.

use tokio::sync::mpsc;

#[actix::main]
async fn main() {
    let indexer_config = near_indexer::IndexerConfig {
        home_dir,
        sync_mode: near_indexer::SyncModeEnum::FromInterruption,
        await_for_node_synced: near_indexer::AwaitForNodeSyncedEnum::WaitForFullSync,
        validate_genesis: true,
        ignore_missing_local_delayed_receipt: false, // <-- new flag
    };
    let indexer = near_indexer::Indexer::new(indexer_config).expect("Failed to create indexer instance");
    // Explicitly create a tokio::sync::mpsc::channel
    let (sender, receiver) = indexer::streamer_channel();
    let indexer_result = indexer.start_streamer(sender).await;
    match indexer_result {
        Ok(_) => {
            println!("Indexer job finished successfully");
        }
        Err(e) => {
            // Handle the error, e.g. retry the job
            eprintln!("Indexer job failed: {:?}", e);
        }
    }
}

Breaking changes

  • Add ignore_missing_local_delayed_receipt boolean flag to the IndexerConfig struct to allow ignoring missing local delayed receipts. This flag should be false by default, so if you don't want to ignore missing local delayed receipts, you don't need to change anything. If you want to ignore missing local delayed receipts, you should set this flag to true in the IndexerConfig struct.

Even though this is a breaking change, the change to the existing code base should be minimal. You should just add the ignore_missing_local_delayed_receipt flag to the IndexerConfig struct and set it to false if you don't want to ignore missing local delayed receipts (the behaviour you have right now, it will panic if the local delayed receipt is missing) or set it to true if you want to ignore missing local delayed receipts. Warning! in this case the indexer will skip the entire block since it can't create a consistent StreamerMessage without a local delayed receipt.

@khorolets khorolets requested a review from a team as a code owner May 15, 2024 09:47
@khorolets khorolets force-pushed the refactor/indexer-propagate-error-to-the-top branch from e07c06f to 17bccc8 Compare May 15, 2024 10:20
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 149 lines in your changes missing coverage. Please review.

Project coverage is 71.33%. Comparing base (9a72275) to head (f873e5e).

Files Patch % Lines
chain/indexer/src/streamer/mod.rs 0.00% 59 Missing ⚠️
chain/indexer/src/lib.rs 0.00% 54 Missing ⚠️
chain/indexer/src/streamer/fetchers.rs 0.00% 19 Missing ⚠️
tools/indexer/example/src/main.rs 0.00% 14 Missing ⚠️
chain/indexer/src/streamer/errors.rs 0.00% 1 Missing ⚠️
chain/indexer/src/streamer/utils.rs 0.00% 1 Missing ⚠️
tools/mirror/src/lib.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11315      +/-   ##
==========================================
- Coverage   71.37%   71.33%   -0.05%     
==========================================
  Files         790      790              
  Lines      160067   160161      +94     
  Branches   160067   160161      +94     
==========================================
- Hits       114253   114246       -7     
- Misses      40836    40938     +102     
+ Partials     4978     4977       -1     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.35% <0.00%> (-0.01%) ⬇️
integration-tests 37.65% <0.00%> (-0.01%) ⬇️
linux 68.76% <0.00%> (-0.06%) ⬇️
linux-nightly 70.86% <0.00%> (-0.06%) ⬇️
macos 52.39% <0.00%> (-0.05%) ⬇️
pytests 1.57% <0.00%> (-0.01%) ⬇️
sanity-checks 1.37% <0.00%> (-0.01%) ⬇️
unittests 66.03% <0.00%> (-0.05%) ⬇️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@khorolets khorolets force-pushed the refactor/indexer-propagate-error-to-the-top branch from 17bccc8 to 20088b9 Compare May 15, 2024 10:54
Copy link
Contributor

@marcelo-gonzalez marcelo-gonzalez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this will definitely be a good first step. It looks roughly okay to me, so I'll approve it for now, and tomorrow will actually check that it fixes the crash on a localnet (seems like it must though).

But I wonder if we can try and do something more direct with these delayed receipts. Because even with this fix, if we prepare genesis records from a time on mainnet when there were lots of delayed receipts, then the mirror traffic generator we will spend lots of CPU time in the indexer looking back 1000 blocks when it finds one of these genesis delayed receipts. I would guess this might actually add up to quite a lot.

Maybe it would be possible to have the indexer make an initial pass where when it runs for the first time it looks for all the delayed receipts (should just be in the state, like it's accessed here) , and then remembers all those receipt IDs, so that when we see one of them in the outcomes, we don't have to go back 1000 blocks to try and find it. Then for subsequent receipts, just remember all the receipts we see, and forget them when you see them in the outcomes for a block. The ones we still have stored are the ones that werent executed

Idk how hard that would be or whether it would be worth the time, but in the meantime I wonder if it's worth adding another config option to the indexer that says how far back to look? For the mirror process for example, I think we can live with just 5 or 10 or something, since it's not critical if we miss something

@khorolets
Copy link
Member Author

Maybe it would be possible to have the indexer make an initial pass where when it runs for the first time it looks for all the delayed receipts (should just be in the state, like it's accessed here) , and then remembers all those receipt IDs, so that when we see one of them in the outcomes, we don't have to go back 1000 blocks to try and find it.

I second this.

And since the in-memory cache for local delayed receipt is already implemented on the indexer side, it'd be great to reuse it by just providing all such receipts into that cache when starting from the genesis.

Anyway, I see this feature irrelevant for the common indexer use case to be included as a built-in implementation. I'd rather consider implementing a way to start an indexer with some initial cache data. It would act like a tool for developers allowing to implement persistent cache and start the instance with initial data like in the case of mirror traffic generator.

In the end I see it like a way to optionally pass the data to the cache, and then it's up to another tool to provide that data. What do you think @marcelo-gonzalez ?

@khorolets khorolets enabled auto-merge May 27, 2024 12:10
@khorolets khorolets added this pull request to the merge queue May 27, 2024
@khorolets khorolets removed this pull request from the merge queue due to a manual request May 27, 2024
@khorolets khorolets marked this pull request as draft May 28, 2024 13:53
@khorolets khorolets force-pushed the refactor/indexer-propagate-error-to-the-top branch from aa06ab0 to f676cab Compare May 28, 2024 13:56
@khorolets
Copy link
Member Author

OK, so I have revisited this PR after some time.

I've noticed I didn't get rid of one panic!, sorry for that. I've fixed it.

I got test data for Statelessnet where I caught this error with a not-found receipt and decided it was a great chance to make a proper handling of such cases.

I am far from an expert in that are of nearcore though. I followed the process of building the database from a genesis file and figured there is DelayedReceipt type of StateRecord, and I followed that up to the DBCol::State key TrieKey::DelayedReceiptIndices. I had to quickly figure how you guys work with that and crafted new methods in the Store (I guess). I made all the necessary stuff expecting to see that Local Delayed Receipt in the DB, but when I tested it I encountered the None after getting the DelayedReceiptIndices.

@VanBarbascu @marcelo-gonzalez Can you please have a look? Maybe I used it in the wrong way.

I am going to leave comments right in the code for you guys. Thanks!

@@ -1364,6 +1377,32 @@ impl ChainStoreAccess for ChainStore {
.map_err(|e| e.into())
}

fn get_delayed_receipt(&self, receipt_id: &CryptoHash) -> Result<Option<Arc<Receipt>>, Error> {
let delayed_receipts_indices: Option<TrieQueueIndices> =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place returns None when I test.

I wonder whether it is the wrong use of the store or whether the problem is that the Receipt I am looking for is not just Delayed but Local Delayed. Do such receipts even end up in the Genesis records file during the state dump? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Longarithm for visibility. Where I need to be taught

… ignore missing local delayed receipt (for mirroring, forknet)
…net/mocknet), improve Indexer to try to find it through ViewClient before going 1000 blocks behind
@khorolets khorolets force-pushed the refactor/indexer-propagate-error-to-the-top branch from 283abc9 to f873e5e Compare June 9, 2024 08:05
@khorolets khorolets marked this pull request as ready for review June 9, 2024 08:07
@khorolets
Copy link
Member Author

After some checks and experiments, I have reverted the attempt to look for DelayedReceipts in the database. The approach I was using was wrong and doing that correctly doesn't look better than what we currently have.

So in the end, I've adjusted the logic around the flag ignore_missing_local_delayed_receipt to skip an ExecutionOutcome only instead of the entire block.

Anyway, this flag is not recommended for production. Forknet, localnet, testnet only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(indexer): Fix traffic mirror panic on missing blocks
2 participants