Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare server crash (parallel inter-task dependencies + other conditions) #7

Open
gavento opened this issue Mar 17, 2018 · 1 comment
Assignees
Labels
bug Something isn't working server

Comments

@gavento
Copy link
Contributor

gavento commented Mar 17, 2018

Rain server panics while a task becomes redy here. The relevant part of the log seems to be the following:

...
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
... [many New ready task info lines, various IDs]
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
thread 'main' panicked at 'assertion failed: r', src/server/scheduler.rs:148:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:207
   3: std::panicking::default_hook
             at libstd/panicking.rs:223
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:402
   5: std::panicking::begin_panic
   6: librain::server::scheduler::ReactiveScheduler::schedule
   7: librain::server::state::State::run_scheduler
   8: librain::server::state::<impl librain::common::wrapped::WrappedRcRefCell<librain::server::state::State>>::turn
   9: rain::main
  10: std::rt::lang_start::{{closure}}
  11: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:306
  12: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:102
  13: std::rt::lang_start_internal
             at libstd/panicking.rs:285
             at libstd/panic.rs:361
             at libstd/rt.rs:58
  14: main
  15: __libc_start_main
  16: _start
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2018-03-17T15:31:49Z: tokio_reactor::background: shutting background reactor down NOW
...

However, a small test for multiple identical inputs passes, even with subsequent submits. The benchmark only fails with >500 tasks per layer. See the benchmark attached. It was run as python3 scalebench.py net -l 256 -w 1024 -s 0, the error happens around layer 10.
The debug checks with RAIN_DEBUG_MODE=1 do not find any consistency problems.

scalebench.py.txt

@gavento gavento changed the title Server crash on one task depending on another task twice while under load Server crash when one task depends on another task twice Mar 17, 2018
@spirali spirali added the bug Something isn't working label Mar 17, 2018
@gavento
Copy link
Contributor Author

gavento commented Apr 12, 2018

This seems difficult to reproduce - the duplicate dependency itself is not a problem, just the trigger under heavy load, and even there, @spirali could not reproduce it.

@gavento gavento added the server label Apr 12, 2018
@gavento gavento mentioned this issue Jul 2, 2018
15 tasks
@gavento gavento changed the title Server crash when one task depends on another task twice Rare server crash (parallel inter-task dependencies + other conditions) Jul 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working server
Projects
None yet
Development

No branches or pull requests

2 participants