Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Add fetch block manager to sync blocks in a separate thread #13262

Closed
wants to merge 12 commits into from

Conversation

vusirikala
Copy link
Contributor

Description

Add fetch block manager to sync blocks in a separate thread

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented May 13, 2024

⏱️ 21h 55m total CI duration on this PR
Job Cumulative Duration Recent Runs
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 6h 8m 🟥🟩🟩🟩🟩
forge-e2e-test / forge 4h 58m 🟥🟥🟥🟩🟩
rust-targeted-unit-tests 1h 46m 🟥🟥🟥🟥🟥 (+1 more)
rust-smoke-tests 1h 44m 🟥🟥🟥🟥🟥
forge-compat-test / forge 1h 22m 🟥🟩🟥🟥🟥
rust-images / rust-all 1h 8m 🟩🟩🟩🟩🟩
rust-move-tests 1h 5m 🟩🟩🟩🟩🟩 (+2 more)
cli-e2e-tests / run-cli-tests 46m 🟥🟥🟥🟥🟥
rust-lints 45m 🟩🟩🟩🟩🟥 (+1 more)
rust-build-cached-packages 25m 🟩🟩🟩🟩🟩
run-tests-main-branch 23m 🟩🟩🟩🟩🟩
check 21m 🟩🟩🟩🟩🟩
test-target-determinator 16m 🟥🟥🟥🟥🟥 (+1 more)
execution-performance / test-target-determinator 16m 🟥🟥🟥🟥🟥
check-dynamic-deps 11m 🟩🟩🟩🟩🟩 (+2 more)
general-lints 9m 🟩🟩🟩🟩🟩 (+1 more)
node-api-compatibility-tests / node-api-compatibility-tests 4m 🟩🟩🟩🟩🟩
semgrep/ci 3m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 55s 🟩🟩🟩🟩🟩 (+1 more)
permission-check 22s 🟩🟩🟩🟩🟩 (+2 more)
determine-docker-build-metadata 20s 🟩🟩🟩🟩🟩 (+1 more)
permission-check 17s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 17s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 16s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 13s 🟩🟩🟩🟩🟩 (+1 more)

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
check-dynamic-deps 4m 1m +200%
rust-build-cached-packages 8m 5m +74%
cli-e2e-tests / run-cli-tests 10m 7m +48%
forge-compat-test / forge 18m 14m +30%

settingsfeedbackdocs ⋅ learn more about trunk.io

@vusirikala vusirikala requested review from sitalkedia and removed request for sasha8 May 13, 2024 06:57
@vusirikala vusirikala added CICD:build-images when this label is present github actions will start build+push rust images from the PR. CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR labels May 13, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@vusirikala vusirikala marked this pull request as draft May 15, 2024 23:53

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac

two traffics test: inner traffic : committed: 7809 txn/s, latency: 5012 ms, (p50: 4800 ms, p90: 5700 ms, p99: 10500 ms), latency samples: 3381540
two traffics test : committed: 100 txn/s, latency: 1921 ms, (p50: 1900 ms, p90: 2200 ms, p99: 3900 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.209, avg: 0.202", "QsPosToProposal: max: 0.276, avg: 0.253", "ConsensusProposalToOrdered: max: 0.461, avg: 0.410", "ConsensusOrderedToCommit: max: 0.362, avg: 0.350", "ConsensusProposalToCommit: max: 0.772, avg: 0.760"]
Max round gap was 1 [limit 4] at version 1700075. Max no progress secs was 4.614867 [limit 15] at version 1700075.
Test Ok

Copy link
Contributor

❌ Forge suite compat failure on 01b24e7e3548382dd25440b39a0438a993387f12 ==> 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac

Compatibility test results for 01b24e7e3548382dd25440b39a0438a993387f12 ==> 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac (PR)
1. Check liveness of validators at old version: 01b24e7e3548382dd25440b39a0438a993387f12
compatibility::simple-validator-upgrade::liveness-check : committed: 5575 txn/s, latency: 5901 ms, (p50: 5100 ms, p90: 8400 ms, p99: 17300 ms), latency samples: 195140
2. Upgrading first Validator to new version: 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1498 txn/s, latency: 17310 ms, (p50: 19500 ms, p90: 28000 ms, p99: 29600 ms), latency samples: 88400
3. Upgrading rest of first batch to new version: 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1513 txn/s, latency: 17337 ms, (p50: 18600 ms, p90: 30800 ms, p99: 31200 ms), latency samples: 92300
4. upgrading second batch to new version: 25ff0defeabf53dc5f3ba8cd48cdc95dc5a03fac
Test Failed: Tried executing 10 txns, request counters: "success 0, failed submit [0], failed wait [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10], by client: [(0, 0, 90): http://aptos-node-3-validator.forge-compat-pr-13262.svc:8080/v1/]\n[(0, 0, 90): http://aptos-node-2-validator.forge-compat-pr-13262.svc:8080/v1/]"

Caused by:
    Unknown error Ledger on endpoint (http://aptos-node-3-validator.forge-compat-pr-13262.svc:8080/v1/) is more than 60s behind current time, timing out waiting for the transaction. Warning, transaction (b13d747f) might still succeed.

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:565:25
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs:1963:27
   2: aptos_transaction_emitter_lib::emitter::transaction_executor::RestApiReliableTransactionSubmitter::submit_check_and_retry::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/transaction_executor.rs:127:28
   3: <futures_util::future::maybe_done::MaybeDone<Fut> as core::future::future::Future>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/maybe_done.rs:95:38
   4: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/join_all.rs:143:24
   5: <aptos_transaction_emitter_lib::emitter::transaction_executor::RestApiReliableTransactionSubmitter as aptos_transaction_generator_lib::ReliableTransactionSubmitter>::execute_transactions_with_counter::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/transaction_executor.rs:309:10
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125:9
   7: aptos_transaction_emitter_lib::emitter::account_minter::AccountMinter::create_and_fund_seed_accounts::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/account_minter.rs:434:18
   8: aptos_transaction_emitter_lib::emitter::account_minter::AccountMinter::create_and_fund_accounts::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/account_minter.rs:326:14
   9: aptos_transaction_emitter_lib::emitter::create_accounts::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/mod.rs:1195:14
  10: aptos_transaction_emitter_lib::emitter::TxnEmitter::start_job::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/mod.rs:717:10
  11: aptos_transaction_emitter_lib::emitter::TxnEmitter::emit_txn_for_impl::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/mod.rs:827:14
  12: aptos_transaction_emitter_lib::emitter::TxnEmitter::emit_txn_for::{{closure}}
             at ./crates/transaction-emitter-lib/src/emitter/mod.rs:859:14
  13: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
  14: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  15: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  16: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
  17: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
  18: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
  19: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  20: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  21: tokio::runtime::runtime::Runtime::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:350:50
  22: aptos_testcases::generate_traffic
             at ./testsuite/testcases/src/lib.rs:105:17
  23: <aptos_testcases::compatibility_test::SimpleValidatorUpgrade as aptos_forge::interface::network::NetworkTest>::run
             at ./testsuite/testcases/src/compatibility_test.rs:114:24
  24: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:598:42
  25: aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:666:11
  26: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:598:30
  27: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:427:11
  28: forge::main
             at ./testsuite/forge-cli/src/main.rs:353:21
  29: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  30: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
  31: std::rt::lang_start::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
  32: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
  33: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  34: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  35: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  36: std::rt::lang_start_internal::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
  37: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  38: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  39: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  40: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  41: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  42: __libc_start_main
  43: _start
Trailing Log Lines:
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  39: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  40: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  41: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  42: __libc_start_main
  43: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-compat-pr-13262-1715951775-01b24e7e3548382dd25440b39a0438","timestamp":"2024-05-17T13:32:29.429353Z","message":"Deleting namespace forge-compat-pr-13262: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-compat-pr-13262-1715951775-01b24e7e3548382dd25440b39a0438","timestamp":"2024-05-17T13:32:29.429403Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-13262"}
Failed to run tests:
Tests Failed

failures:
    compatibility::simple-validator-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:618:13
   2: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:427:11
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:353:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  15: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  16: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  17: __libc_start_main
  18: _start
Debugging output:
NAME                       READY   STATUS    RESTARTS   AGE
aptos-node-0-validator-0   1/1     Running   0          7m49s
aptos-node-1-validator-0   1/1     Running   0          10m
aptos-node-2-validator-0   1/1     Running   0          5m31s
aptos-node-3-validator-0   1/1     Running   0          4m44s

@vusirikala vusirikala closed this May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-images when this label is present github actions will start build+push rust images from the PR. CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant