Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0.9 crashes after "[ain_evm::services] Starting tokio waiter" on x86_64-apple-darwin #2907

Open
luzianscherrer opened this issue May 1, 2024 · 6 comments
Labels

Comments

@luzianscherrer
Copy link

Summary

Last entries in debug.log before the crash:

2024-05-01T19:40:06Z [ain_grpc] Init rs services
2024-05-01T19:40:06Z [ain_evm::services] Starting tokio waiter

Steps to Reproduce

[Please use step-by bullet points to help the team reproduce the bug]

  • Just run defichain-4.0.9-x86_64-apple-darwin from latest snapshot

Environment

  • Node Version: 4.0.9
  • OS with version: Darwin 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 x86_64 (macOS 12.4 21F79)
@luzianscherrer
Copy link
Author

luzianscherrer commented May 1, 2024

When running without the -daemon flag I get the following on stdout:

2024-05-01T21:14:53Z [ain_evm::services] Starting tokio waiter
thread '<unnamed>' panicked at 'Error initializating handlers: RocksDBError(Error { message: "IO error: While open a file for random read: /Users/.../defichain/data/evm/indexes/002540.sst: Too many open files" })', ain-evm/src/services.rs:69:46
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
libc++abi: terminating with uncaught foreign exception
Abort trap: 6

And with RUST_BACKTRACE=1:

2024-05-01T21:14:53Z [ain_evm::services] Starting tokio waiter
thread '<unnamed>' panicked at 'Error initializating handlers: RocksDBError(Error { message: "IO error: While open a file for random read: /Users/.../defichain/data/evm/indexes/002540.sst: Too many open files" })', ain-evm/src/services.rs:69:46
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: ain_evm::services::Services::new
   4: <ain_evm::services::SERVICES as core::ops::deref::Deref>::deref
   5: _cxxbridge1$ain_rs_init_core_services
   6: __ZZ11AppInitMainR14InitInterfacesENK3$_8clEv
   7: __Z11AppInitMainR14InitInterfaces
   8: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
libc++abi: terminating with uncaught foreign exception
Abort trap: 6

And the full stack backtrace from RUST_BACKTRACE=full:

   0:        0x101cd99d2 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9b8c7d4986eea143
   1:        0x101b2fd6b - core::fmt::write::h745c6d87d2702197
   2:        0x101cb2c2e - std::io::Write::write_fmt::h9bc75e1a1578f329
   3:        0x101cdde6a - std::sys_common::backtrace::print::hb5816525719dec2d
   4:        0x101cdda65 - std::panicking::default_hook::{{closure}}::hfc88b82ae9ab0222
   5:        0x101cdea48 - std::panicking::rust_panic_with_hook::hc7199b95a24a631d
   6:        0x101cde544 - std::panicking::begin_panic_handler::{{closure}}::h29c0dd87214757f4
   7:        0x101cde4a9 - std::sys_common::backtrace::__rust_end_short_backtrace::hd299ff4177db45a7
   8:        0x101cde492 - _rust_begin_unwind
   9:        0x1024b8d43 - core::panicking::panic_fmt::h8b25e6b7bc9d8aa4
  10:        0x1024b91d5 - core::result::unwrap_failed::h3575be054108b8be
  11:        0x101977942 - ain_evm::services::Services::new::h746261e301da3d6e
  12:        0x1019815c4 - <ain_evm::services::SERVICES as core::ops::deref::Deref>::deref::h39d5cc5a5c2ea470
  13:        0x101881535 - _cxxbridge1$ain_rs_init_core_services
  14:        0x1011da361 - __ZZ11AppInitMainR14InitInterfacesENK3$_8clEv
  15:        0x1011d4b96 - __Z11AppInitMainR14InitInterfaces
  16:        0x10117ede1 - _main
libc++abi: terminating with uncaught foreign exception
Abort trap: 6

Regarding open files:

$ sysctl kern.maxfiles
kern.maxfiles: 122880
$ sysctl kern.maxfilesperproc
kern.maxfilesperproc: 61440

@prasannavl
Copy link
Member

prasannavl commented May 8, 2024

Thanks for this report. Working on a fix for this.
Temporary workaround: Snapshots below height 3943543 should work as expected.

@prasannavl prasannavl added p1 High priority // daily updates and removed p1 High priority // daily updates labels May 8, 2024
@prasannavl
Copy link
Member

prasannavl commented May 8, 2024

Edit: Conflated issues. This seems to be related to open files in darwin.

Your kernel max count seems OK, but not sure if it's being used up by other applications. Could you try increasing or so?

@luzianscherrer
Copy link
Author

Your kernel max count seems OK, but not sure if it's being used up by other applications. Could you try increasing or so?

There is nothing else running on this machine, its only purpose is running the MN. I've switched to the Linux version in Docker as a workaround which is working fine (same machine).

@prasannavl
Copy link
Member

Interesting. Thanks. I'll leave this up-to someone with darwin expertise to look at.

@prasannavl
Copy link
Member

Unlikely that the node is using up 61440+ open files, unless some resource isn't released properly occurring only in darwin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants