Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort upon panic #693

Merged
merged 1 commit into from Feb 14, 2023
Merged

Abort upon panic #693

merged 1 commit into from Feb 14, 2023

Conversation

pavel-kokolemin
Copy link
Contributor

@pavel-kokolemin pavel-kokolemin commented Feb 11, 2023

Right now tokio runtime catches panics from spawned tasks and just prints something like this (I added the panic! statement for test):

thread 'tokio-runtime-worker' panicked at 'explicit panic', /path/to/p2p/src/net/default_backend/peer.rs:204:9

And the node will continue working after that (same for release and debug builds).
Aborting the process should be a safer choice. If the process is aborted, the core dump should be enough to debug the problem.
This change will affect all binaries in the workspace (node, wallet, dns server).

Copy link
Contributor

@stanislav-tkach stanislav-tkach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this makes sense unless we want to catch panics somewhere.

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated
@@ -90,9 +90,13 @@ static_assertions = "1.1"
thiserror = "1.0"
tokio = "1.0"

[profile.dev]
panic = "abort"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here explaining why this is being added. I'm sure in a couple of months we'll forget.

Copy link
Collaborator

@TheQuantumPhysicist TheQuantumPhysicist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the comment and feel free to merge.

@iljakuklic
Copy link
Contributor

How does this interact with proptest shrinking? If a proptest fails, it tries to reproduce the failure with smaller inputs to find a small input that still produces the issue. I guess if panicking fails, it will block shrinking from happening. We also lose backtraces if I am not mistaken. Can you check? Could this be done with a tokio settings, or maybe wrapper over tokio::spawn, rather than making it a global change?

@pavel-kokolemin pavel-kokolemin marked this pull request as draft February 13, 2023 09:25
Abort processes upon panic is safer than the default "unwind" stack strategy.
For example, it prevents tokio from catching panics from spawned tasks.
@pavel-kokolemin
Copy link
Contributor Author

pavel-kokolemin commented Feb 14, 2023

How does this interact with proptest shrinking? If a proptest fails, it tries to reproduce the failure with smaller inputs to find a small input that still produces the issue. I guess if panicking fails, it will block shrinking from happening.

Proptest works fine because it's used from the test profile. I just checked it.

We also lose backtraces if I am not mistaken. Can you check?

Works fine (I tested the release build):

thread 'tokio-runtime-worker' panicked at 'explicit panic', mintlayer-core/p2p/src/net/default_backend/peer.rs:204:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: p2p::net::default_backend::backend::Backend<T>::create_peer::{{closure}}
   4: tokio::runtime::task::core::Core<T,S>::poll
   5: tokio::runtime::task::harness::Harness<T,S>::poll
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run
   8: tokio::runtime::scheduler::multi_thread::worker::run
   9: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
  10: tokio::runtime::task::core::Core<T,S>::poll
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Could this be done with a tokio settings, or maybe wrapper over tokio::spawn, rather than making it a global change?

Tokio runtime does not have such setting yet: tokio-rs/tokio#4516
A wrapper for tokio::spawn or aborting the process with std::panic::set_hook should work, but I don't think it's worth the effort.

Copy link
Contributor

@iljakuklic iljakuklic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for researching the interaction with proptest

@pavel-kokolemin pavel-kokolemin merged commit 9a2bf85 into master Feb 14, 2023
@pavel-kokolemin pavel-kokolemin deleted the panic_abort branch February 14, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants