Disrupted bootstrap can break host ID -> IP mappings #18676

margdoc · 2024-05-14T14:23:44Z

When running

./test.py test_ip_mappings

on margdoc@a7c4afa, the first node is aborted after the restart:

ERROR 2024-05-14 11:16:30,556 [shard 0:strm] storage_proxy - No mapping for :: in the passed effective replication map, at: /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x56c15e /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x56c780 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x56ca68 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3b0ea7 0x1da64ac 0x1da6bc5 0x1e3dd9a 0x1e3da55 0x1e3d911 0x1e3d82d 0x1e08b5b 0x1da150b 0x1e336b1 0x1e335de 0x1e39f3a 0x1e3a377 0x1e3a1b4 0x1e39f3a 0x1e39d6e 0x1e399a8 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x37bc92 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3ff1ef /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x400377 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3ff848 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x36de68 /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x36d0a2 0x1015bfa 0x1052fa0 0x1014ae7 /lib64/libc.so.6+0x27b89 /lib64/libc.so.6+0x27c4a 0x1013464
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_0>
Aborting on shard 0.
Backtrace:
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3f03cc
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x41c761
  /lib64/libc.so.6+0x3dbaf
  /lib64/libc.so.6+0x8e883
  /lib64/libc.so.6+0x3dafd
  /lib64/libc.so.6+0x2687e
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3b0f2f
  0x1da64ac
  0x1da6bc5
  0x1e3dd9a
  0x1e3da55
  0x1e3d911
  0x1e3d82d
  0x1e08b5b
  0x1da150b
  0x1e336b1
  0x1e335de
  0x1e39f3a
  0x1e3a377
  0x1e3a1b4
  0x1e39f3a
  0x1e39d6e
  0x1e399a8
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x37bc92
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3ff1ef
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x400377
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x3ff848
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x36de68
  /home/margdoc/Workspace/scylla/build/dev/seastar/libseastar.so+0x36d0a2

  0x1015bfa
  0x1052fa0
  0x1014ae7
  /lib64/libc.so.6+0x27b89
  /lib64/libc.so.6+0x27c4a
  0x1013464

The scenario of this test is as follows:

the first node bootstraps correctly
the second node starts bootstrapping
both nodes crash just before the first node saves an IP of the second node in the system.peers table (error injection "crash-before-bootstrapping-node-added"). Bootstrap tokens of the second node are already saved in topology_coordinator::handle_topology_transition.
restart the first node and query it with inserts and selects
the node crashes when executing one of these queries after entering commit cdc generation transition state

scylla-1.log
scylla-2.log
topology_experimental_raft.test_ip_mappings.1.log

Decoded error backtrace:

__GI___sigaction at :?
?? at /usr/src/debug/glibc-2.37-18.fc38.x86_64/nptl/pthread_kill.c:43 (discriminator 1)
__GI_raise at /usr/src/debug/glibc-2.37-18.fc38.x86_64/signal/../sysdeps/posix/raise.c:26 (discriminator 1)
__GI_abort at /usr/src/debug/glibc-2.37-18.fc38.x86_64/stdlib/abort.c:79
seastar::on_internal_error(seastar::logger&, std::basic_string_view<char, std::char_traits<char> >) at ??:?
service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>) at storage_proxy.cc:?
service::storage_proxy::create_write_response_handler(mutation const&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at storage_proxy.cc:?
service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}::operator()(mutation const&, db::consistency_level, db::write_type, service_permit) at storage_proxy.cc:?
service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})::{lambda(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(auto:1&&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})#1}::operator()(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}) const at storage_proxy.cc:?
seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > seastar::futurize<seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > >::invoke<service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})::{lambda(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(auto:1&&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})#1}, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit&&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}&&) at storage_proxy.cc:?
seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at storage_proxy.cc:?
seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > service::storage_proxy::mutate_internal<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > > >(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >, db::consistency_level, bool, tracing::trace_state_ptr, service_permit, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::lw_shared_ptr<cdc::operation_result_tracker>, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at storage_proxy.cc:?
service::storage_proxy::do_mutate(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at storage_proxy.cc:?
seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > std::__invoke_impl<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(std::__invoke_memfun_deref, seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at storage_proxy.cc:?
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::direct_vtable_for<std::_Mem_fn<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::*)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> >::call(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> const*, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at storage_proxy.cc:?
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::operator()(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) const at storage_proxy.cc:?
seastar::inheriting_concrete_execution_stage<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::make_stage_for_group(seastar::scheduling_group)::{lambda(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)#1}::operator()(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) const at storage_proxy.cc:?
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::direct_vtable_for<seastar::inheriting_concrete_execution_stage<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::make_stage_for_group(seastar::scheduling_group)::{lambda(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)#1}>::call(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> const*, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at storage_proxy.cc:?
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::operator()(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) const at storage_proxy.cc:?
seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > std::__invoke_impl<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(std::__invoke_other, seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at storage_proxy.cc:?
seastar::concrete_execution_stage<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::do_flush() at storage_proxy.cc:?
seastar::lambda_task<seastar::execution_stage::flush()::$_0>::run_and_dispose() at execution_stage.cc:?
seastar::reactor::run_some_tasks() at ??:?
seastar::reactor::do_run() at ??:?
seastar::reactor::run() at ??:?
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:?
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ??:?
scylla_main(int, char**) at main.cc:?
std::function<int (int, char**)>::operator()(int, char**) const at ??:?
main at main.cc:?

The text was updated successfully, but these errors were encountered:

bhalevy · 2024-05-15T04:41:01Z

I see this error also in dtest, for example:
https://jenkins.scylladb.com/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/63/testReport/replace_address_test/TestReplaceAddress/FullDtest___full_split004___test_serve_writes_during_bootstrap_use_host_id_rbo_enabled_/

failed on teardown with "AssertionError: Critical errors found: [('node1', ['Aborting on shard 1.']), ('node2', ['Aborting on shard 1.'])]
Other errors: [('node1', ['ERROR 2024-05-15 01:18:54,928 [shard 1:stmt] storage_proxy - No mapping for :: in the passed effective replication map, at: 0x63d7e8e 0x63d84a0 0x63d8788 0x5e90577 0x31ec630 0x31ed467 0x32e1e92 0x32e1351 0x32e0bf1 0x31fb30e 0x31dfeb9 0x32c6530 0x32c644e 0x32d98c8 0x32daa55 0x32da192 0x32d98c8 0x32d8fff 0x32d7ed0 0x5e66f42 0x5ed0bff 0x5ed1ee7 0x5ef5e20 0x5e912aa /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/9ab57b12bb03bdd219f0235f5b77cd1b72a39d95/libreloc/libc.so.6+0x8c946 /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/9ab57b12bb03bdd219f0235f5b77cd1b72a39d95/libreloc/libc.so.6+0x11296f']), ('node2', ['ERROR 2024-05-15 01:18:54,656 [shard 1:stmt] storage_proxy - No mapping for :: in the passed effective replication map, at: 0x63d7e8e 0x63d84a0 0x63d8788 0x5e90577 0x31ec630 0x31ed467 0x32e1e92 0x32e1351 0x32e0bf1 0x31fb30e 0x31dfeb9 0x32c6530 0x32c644e 0x32d98c8 0x32daa55 0x32da192 0x32d98c8 0x32d8fff 0x32d7ed0 0x5e66f42 0x5ed0bff 0x5ed1ee7 0x5ef5e20 0x5e912aa /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/9ab57b12bb03bdd219f0235f5b77cd1b72a39d95/libreloc/libc.so.6+0x8c946 /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/9ab57b12bb03bdd219f0235f5b77cd1b72a39d95/libreloc/libc.so.6+0x11296f']), ('node4', ['ERROR 2024-05-15 01:18:54,886 [shard 0:strm] stream_session - [Stream #18a1f033-1259-11ef-8a80-123c850ad090] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.46.2: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer))', 'ERROR 2024-05-15 01:18:54,886 [shard 1:strm] stream_session - [Stream #18a1f031-1259-11ef-8a80-123c850ad090] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.46.2: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer))'])]"

bhalevy · 2024-05-15T04:47:52Z

@patjed41 could this be related to #18497?

mykaul · 2024-05-15T06:10:32Z

It'd be good to have the backtrace decoded and in a comment here (for search later). @margdoc

patjed41 · 2024-05-15T07:44:55Z

@patjed41 could this be related to #18497?

I don't think so. #18497 changes the mechanism responsible for clearing system.cdc_generations_v3, which is independent of the bootstrap procedure.

This issue looks like a newly discovered bug.

bhalevy · 2024-05-15T08:08:38Z

It seems like a recent regression, let's get to the bottom of it and fix whatever needs fixing or consider reverting the patch that caused/exposed it.

kbr-scylla · 2024-05-15T10:08:09Z

The root cause of the failure -- scenario leading to a missing mapping -- is not a recent regression.

But the fact that it now causes on_internal_error failure is a recent regression -- introduced in cfd03fe (cc @dawmd @piodul)

But it's not like the on_internal_error call is incorrect, I think. We do an assertion that is checking for a precondition which should be true, but the code which maintains topology metadata is not careful enough to make sure the precondition holds.

@margdoc please also post the findings of what happens before cfd03fe. I remember there were no failures or crashes, but there were lots of warnings happening in the logs and through analysis we concluded that it is possible to perform writes which should be failing in that broken state with missing mapping.

kbr-scylla · 2024-05-15T15:32:41Z

Also @margdoc please write down an explanation how we end up with an entry in locator::topology that does not have an IP. (With some pointers to code)

kbr-scylla · 2024-05-17T11:46:53Z

On one hand the scenario presented looks like it's hard to hit -- we need a crash in very specific moment. On the other, because of the recently introduced on_internal_error, it could be that the node enters a crash loop -- when it restarts it will crash again due to the missing mapping. It might be possible for the node to obtain the mapping from other nodes in the cluster before it tries to coordinate another query, and then the node will recover from the crash loop, but I'm not sure -- it could be also that the node would always try to coordinate a query (and crash) before it manages to gossip with other nodes.

But even if the node enters a crash loop, it should be possible to recover it using maintenance mode -- by writing the mapping into system.peers manually.

All in all, if we consider just the scenario presented in the first post, I don't think it's a blocker.

But there's also the dtest that @bhalevy referenced:
#18676 (comment)

I suspect it's a bit different scenario that hits the same on_internal_error -- it could be a different issue altogether. It could also turn out to be tablets specific, because that dtest failure happened in a job with tablets, and I haven't seen it in a job without tablets.

We need to understand if it's the same thing (losing IP mapping due to crash) or something else.

For now I'm marking the issue as a blocker. I'll try to take a look at Benny's test and see if it's a different issue. If it is something different then I'll probably open a new GitHub issue, mark that one as blocker, and mark this one as non-blocker.

bhalevy · 2024-05-19T20:57:24Z

We now also see a similar symptom in e.g. https://jenkins.scylladb.com/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/67/artifact/logs-full.release.003/1716082048650_replace_address_test.py%3A%3ATestReplaceAddress%3A%3Atest_replace_node_diff_ip_take_write%5Buse_endpoint-rbo_enabled%5D/node3.log
(but it's happens frequently with other variants of this dtest as well)

Scylla version 5.5.0~dev-0.20240518.c93a7d266421 with build-id 06f8977c4c6f7a06c6454b537c9efd925006843f starting ...


INFO  2024-05-19 01:27:25,375 [shard 0:strm] token_metadata - Added node bf5fa9c5-c1ef-441e-8d49-ddda20353028 as pending replacing endpoint which replaces existing node bf6b0900-4429-41bf-a250-cea2aaebc0f8
INFO  2024-05-19 01:27:25,419 [shard 0:strm] token_metadata - Added node bf5fa9c5-c1ef-441e-8d49-ddda20353028 as pending replacing endpoint which replaces existing node bf6b0900-4429-41bf-a250-cea2aaebc0f8
INFO  2024-05-19 01:27:25,439 [shard 0:strm] gossip - Removed endpoint 127.0.22.5
INFO  2024-05-19 01:27:25,439 [shard 0:strm] gossip - Finished to force remove node 127.0.22.5
ERROR 2024-05-19 01:27:25,474 [shard 1:stmt] storage_proxy - No mapping for :: in the passed effective replication map, at: 0x63e66fe 0x63e6d10 0x63e6ff8 0x5e9ef27 0x31ef67b 0x31f04a7 0x32e48e2 0x32e3da1 0x32e3641 0x31fe2de 0x31e2f49 0x32c9100 0x32c901e 0x32dc318 0x32dd4a5 0x32dcbe2 0x32dc318 0x32dba4f 0x32da920 0x5e758f2 0x5edf5af 0x5ee0897 0x5f047d0 0x5e9fc5a /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x8c946 /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x11296f
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::$_0>
Aborting on shard 1.
Backtrace:
  0x5ecd978
  0x5f041d1
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x3dbaf
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x8e883
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x3dafd
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x2687e
  0x5e9efa7
  0x31ef67b
  0x31f04a7
  0x32e48e2
  0x32e3da1
  0x32e3641
  0x31fe2de
  0x31e2f49
  0x32c9100
  0x32c901e
  0x32dc318
  0x32dd4a5
  0x32dcbe2
  0x32dc318
  0x32dba4f
  0x32da920
  0x5e758f2
  0x5edf5af
  0x5ee0897
  0x5f047d0
  0x5e9fc5a
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x8c946
  /jenkins/workspace/scylla-master/tablets/gating-dtest-release-with-tablets/scylla/.ccm/scylla-repository/c93a7d2664215bd7bc64e056911685645d1c1dd0/libreloc/libc.so.6+0x11296f

Decoded:

void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:68
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:825
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:855
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:867
 (inlined by) seastar::sigabrt_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:4071
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4047
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:4043
/data/scylla-s3-reloc.cache/by-build-id/06f8977c4c6f7a06c6454b537c9efd925006843f/extracted/scylla/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=70e92bb237883be3065a6afc9f0696aef2d068bf, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
seastar::on_internal_error(seastar::logger&, std::basic_string_view<char, std::char_traits<char> >) at ./build/release/seastar/./seastar/src/core/on_internal_error.cc:57
operator() at ./service/storage_proxy.cc:3138
 (inlined by) utils::tagged_uuid<locator::host_id_tag> boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >::operator()<gms::inet_address>(gms::inet_address&) const at /usr/include/boost/range/detail/default_constructible_unary_fn.hpp:62
 (inlined by) boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>::dereference() const at /usr/include/boost/iterator/transform_iterator.hpp:126
 (inlined by) boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>::reference boost::iterators::iterator_core_access::dereference<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default> >(boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default> const&) at /usr/include/boost/iterator/iterator_facade.hpp:550
 (inlined by) boost::iterators::detail::iterator_facade_base<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>, utils::tagged_uuid<locator::host_id_tag>, boost::iterators::random_access_traversal_tag, utils::tagged_uuid<locator::host_id_tag>, long, false, false>::operator*() const at /usr/include/boost/iterator/iterator_facade.hpp:656
 (inlined by) bool boost::algorithm::any_of<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>, std::_Bind<bool (db::hints::manager::*(db::hints::manager const*, std::_Placeholder<1>))(utils::tagged_uuid<locator::host_id_tag>) noexcept const> >(boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>, boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, utils::tagged_uuid<locator::host_id_tag> >, boost::range_detail::join_iterator<gms::inet_address*, gms::inet_address*, gms::inet_address, gms::inet_address&, boost::iterators::random_access_traversal_tag>, boost::use_default, boost::use_default>, std::_Bind<bool (db::hints::manager::*(db::hints::manager const*, std::_Placeholder<1>))(utils::tagged_uuid<locator::host_id_tag>) noexcept const>) at /usr/include/boost/algorithm/cxx11/any_of.hpp:35
 (inlined by) bool boost::algorithm::any_of<boost::range_detail::transformed_range<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, boost::range::joined_range<utils::small_vector<gms::inet_address, 3ul>, utils::small_vector<gms::inet_address, 1ul> > >, std::_Bind<bool (db::hints::manager::*(db::hints::manager const*, std::_Placeholder<1>))(utils::tagged_uuid<locator::host_id_tag>) noexcept const> >(boost::range_detail::transformed_range<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, boost::range::joined_range<utils::small_vector<gms::inet_address, 3ul>, utils::small_vector<gms::inet_address, 1ul> > > const&, std::_Bind<bool (db::hints::manager::*(db::hints::manager const*, std::_Placeholder<1>))(utils::tagged_uuid<locator::host_id_tag>) noexcept const>) at /usr/include/boost/algorithm/cxx11/any_of.hpp:50
 (inlined by) bool service::storage_proxy::cannot_hint<boost::range_detail::transformed_range<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, boost::range::joined_range<utils::small_vector<gms::inet_address, 3ul>, utils::small_vector<gms::inet_address, 1ul> > > >(boost::range_detail::transformed_range<service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>)::$_1, boost::range::joined_range<utils::small_vector<gms::inet_address, 3ul>, utils::small_vector<gms::inet_address, 1ul> > > const&, db::write_type) const at ./service/storage_proxy.cc:3871
 (inlined by) service::storage_proxy::create_write_response_handler_helper(seastar::lw_shared_ptr<schema const>, dht::token const&, std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::bool_class<service::cancellable_tag>) at ./service/storage_proxy.cc:3143
service::storage_proxy::create_write_response_handler(mutation const&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at ./service/storage_proxy.cc:3190
operator() at ./service/storage_proxy.cc:3284
operator() at ./service/storage_proxy.cc:3271
seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > seastar::futurize<seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > >::invoke<service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})::{lambda(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})#1}, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit&&, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}&&) at ././seastar/include/seastar/core/future.hh:2035
auto seastar::futurize_invoke<service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})::{lambda(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1})#1}, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level&, db::write_type&, service_permit&&, {lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}&&) at ././seastar/include/seastar/core/future.hh:2066
 (inlined by) seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, service_permit, service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>)::{lambda(mutation const&, db::consistency_level, db::write_type, service_permit)#1}) at ./service/storage_proxy.cc:3267
 (inlined by) seastar::future<boost::outcome_v2::basic_result<utils::small_vector<service::storage_proxy::unique_response_handler, 1ul>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > service::storage_proxy::mutate_prepare<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&>(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >&, db::consistency_level, db::write_type, tracing::trace_state_ptr, service_permit, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at ./service/storage_proxy.cc:3283
 (inlined by) seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > service::storage_proxy::mutate_internal<boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > > >(boost::iterator_range<__gnu_cxx::__normal_iterator<mutation*, std::vector<mutation, std::allocator<mutation> > > >, db::consistency_level, bool, tracing::trace_state_ptr, service_permit, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >, seastar::lw_shared_ptr<cdc::operation_result_tracker>, seastar::bool_class<db::allow_per_partition_rate_limit_tag>) at ./service/storage_proxy.cc:3610
service::storage_proxy::do_mutate(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at ./service/storage_proxy.cc:3570
seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > std::__invoke_impl<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(std::__invoke_memfun_deref, seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:74
std::__invoke_result<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::type std::__invoke<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::* const&)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>), service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96
 (inlined by) _ZNKSt12_Mem_fn_baseIMN7service13storage_proxyEFN7seastar6futureIN5boost10outcome_v212basic_resultIvN5utils19exception_containerIJN10exceptions32mutation_write_timeout_exceptionENS9_22read_timeout_exceptionENS9_22read_failure_exceptionENS9_20rate_limit_exceptionEEEENS7_32exception_container_throw_policyEEEEESt6vectorI8mutationSaISJ_EEN2db17consistency_levelENSt6chrono10time_pointINS2_12lowres_clockENSO_8durationIlSt5ratioILl1ELl1000000000EEEEEEN7tracing15trace_state_ptrE14service_permitbNS2_10bool_classINSM_34allow_per_partition_rate_limit_tagEEENS2_13lw_shared_ptrIN3cdc24operation_result_trackerEEEELb1EEclIJPS1_SL_SN_SV_SX_SY_bS11_S15_EEEDTclsr3stdE8__invokedtdefpT6_M_pmfspclsr3stdE7forwardIT_Efp_EEEDpOS1B_ at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/functional:170
 (inlined by) seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::direct_vtable_for<std::_Mem_fn<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy::*)(std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> >::call(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> const*, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at ././seastar/include/seastar/util/noncopyable_function.hh:129
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::operator()(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) const at ././seastar/include/seastar/util/noncopyable_function.hh:215
operator() at ././seastar/include/seastar/core/execution_stage.hh:340
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::direct_vtable_for<seastar::inheriting_concrete_execution_stage<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::make_stage_for_group(seastar::scheduling_group)::{lambda(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)#1}>::call(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)> const*, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) at ././seastar/include/seastar/util/noncopyable_function.hh:129
seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>::operator()(service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>) const at ././seastar/include/seastar/util/noncopyable_function.hh:215
seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > std::__invoke_impl<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(std::__invoke_other, seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
std::__invoke_result<seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::type std::__invoke<seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*&&, std::vector<mutation, std::allocator<mutation> >&&, db::consistency_level&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, tracing::trace_state_ptr&&, service_permit&&, bool&&, seastar::bool_class<db::allow_per_partition_rate_limit_tag>&&, seastar::lw_shared_ptr<cdc::operation_result_tracker>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96
 (inlined by) decltype(auto) std::__apply_impl<seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, std::tuple<service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul>(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, std::tuple<service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul>) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/tuple:2288
 (inlined by) decltype(auto) std::apply<seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, std::tuple<service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> > >(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, std::tuple<service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/tuple:2299
 (inlined by) seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > seastar::futurize<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > >::apply<seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >(seastar::noncopyable_function<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> > (service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker>)>&, std::tuple<service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >&&) at ././seastar/include/seastar/core/future.hh:2003
 (inlined by) seastar::concrete_execution_stage<seastar::future<boost::outcome_v2::basic_result<void, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy> >, service::storage_proxy*, std::vector<mutation, std::allocator<mutation> >, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, tracing::trace_state_ptr, service_permit, bool, seastar::bool_class<db::allow_per_partition_rate_limit_tag>, seastar::lw_shared_ptr<cdc::operation_result_tracker> >::do_flush() at ././seastar/include/seastar/core/execution_stage.hh:249
operator() at ./build/release/seastar/./seastar/src/core/execution_stage.cc:149
 (inlined by) seastar::future<void> seastar::futurize<void>::invoke<seastar::execution_stage::flush()::$_0&>(seastar::execution_stage::flush()::$_0&) at ./build/release/seastar/./seastar/include/seastar/core/future.hh:2032
 (inlined by) seastar::lambda_task<seastar::execution_stage::flush()::$_0>::run_and_dispose() at ./build/release/seastar/./seastar/include/seastar/core/make_task.hh:44

bhalevy · 2024-05-19T21:25:35Z

Please see https://github.com/bhalevy/scylla/commits/storage_proxy-host_id-wip/ for a sketch for preventing this issue by using host_id where possible on the relevant paths.

Cc @kbr-scylla, @gleb-cloudius WDYT?

gleb-cloudius · 2024-05-20T07:05:19Z

Please see https://github.com/bhalevy/scylla/commits/storage_proxy-host_id-wip/ for a sketch for preventing this issue by using host_id where possible on the relevant paths.

Cc @kbr-scylla, @gleb-cloudius WDYT?

I did not looks to deep into it yet, but when using host ids in higher levels thing like get_natural_hosts_without_node_being_replaced should disappear.

bhalevy · 2024-05-20T07:28:41Z

Please see https://github.com/bhalevy/scylla/commits/storage_proxy-host_id-wip/ for a sketch for preventing this issue by using host_id where possible on the relevant paths.
Cc @kbr-scylla, @gleb-cloudius WDYT?

I did not looks to deep into it yet, but when using host ids in higher levels thing like get_natural_hosts_without_node_being_replaced should disappear.

Right. This is exactly the direction in my branch.
It's not fully polished yet, but if we agree on the direction I can put more time into it to bring it into a submittable state.

margdoc · 2024-05-20T21:36:32Z

Also @margdoc please write down an explanation how we end up with an entry in locator::topology that does not have an IP. (With some pointers to code)

When the second node is bootstrapping, the first node calls topology_coordinator::handle_topology_transition and saves in system.topology the bootstrap tokens of the second node.

                    case node_state::bootstrapping: {
                        assert(!node.rs->ring);
                        auto num_tokens = std::get<join_param>(node.req_param.value()).num_tokens;
                        auto tokens_string = std::get<join_param>(node.req_param.value()).tokens_string;
                        // A node have just been accepted and does not have tokens assigned yet
                        // Need to assign random tokens to the node
                        auto tmptr = get_token_metadata_ptr();
                        std::unordered_set<token> bootstrap_tokens;
                        try {
                            bootstrap_tokens = dht::boot_strapper::get_bootstrap_tokens(tmptr, tokens_string, num_tokens, dht::check_token_endpoint::yes);
                        } catch (...) {
                            _rollback = fmt::format("Failed to assign tokens: {}", std::current_exception());
                        }

                        auto [gen_uuid, guard, mutation] = co_await prepare_and_broadcast_cdc_generation_data(
                                tmptr, take_guard(std::move(node)), bootstrapping_info{bootstrap_tokens, *node.rs});

                        topology_mutation_builder builder(guard.write_timestamp());

                        // Write the new CDC generation data through raft.
                        builder.set_transition_state(topology::transition_state::commit_cdc_generation)
                               .set_new_cdc_generation_data_uuid(gen_uuid)
                               .with_node(node.id)
                               .set("tokens", bootstrap_tokens);
                        auto reason = ::format(
                            "bootstrap: insert tokens and CDC generation data (UUID: {})", gen_uuid);
                        co_await update_topology_state(std::move(guard), {std::move(mutation), builder.build()}, reason);
                    }

But the first node crashes before it saves the IP address of the second node (error injection "crash-before-bootstrapping-node-added"). When the first node restarts, it has only bootstrap tokens and the host ID of the second node.
storage_service::sync_raft_topology_nodes:

    auto process_transition_node = [&](raft::server_id id, const replica_state& rs) -> future<> {
        locator::host_id host_id{id.uuid()};
        auto ip = am.find(id);

        rtlogger.trace("loading topology: raft id={} ip={} node state={} dc={} rack={} tokens state={} tokens={}",
                      id, ip, rs.state, rs.datacenter, rs.rack, _topology_state_machine._topology.tstate,
                      seastar::value_of([&] () -> sstring {
                          return rs.ring ? ::format("{}", rs.ring->tokens) : sstring("null");
                      }));

        switch (rs.state) {
        case node_state::bootstrapping:
            if (rs.ring.has_value()) {
                if (ip && !is_me(*ip)) {
                    utils::get_local_injector().inject("crash-before-bootstrapping-node-added", [] {
                        slogger.error("crash-before-bootstrapping-node-added hit, killing the node");
                        _exit(1);
                    });

                    // Save ip -> id mapping in peers table because we need it on restart, but do not save tokens until owned
                    sys_ks_futures.push_back(_sys_ks.local().update_peer_info(*ip, host_id, {}));
                }

margdoc · 2024-05-20T21:36:47Z

@margdoc please also post the findings of what happens before cfd03fe. I remember there were no failures or crashes, but there were lots of warnings happening in the logs and through analysis we concluded that it is possible to perform writes which should be failing in that broken state with missing mapping.

Before cfd03fe, this test passes (but it shouldn't):
scylla-1.log
scylla-2.log
topology_experimental_raft.test_ip_mappings.1.log

When the first node restarts and enters the commit cdc generation transition state, all writes print warnings. It stops when the node enters the write both read old transition state.

Why should these writes fail?
For each of them, there is created datacenter_write_response_handler.
It decides if the request is considered a failed one. It is if _total_block_for + _failed > _total_endpoints where

_total_block_for - number of replicas required for the write (from consistency level and the ones that are bootstrapping)
_failed - counter of replicas that failed with the write request
_total_endpoints - total number of replicas for this token

A write is failed if there are more failed responses and required successful responses than the total number of replicas. What are these values in our scenario?

_failed: 0 since only request to the first node is sent
_total_block_for:

abstract_write_response_handler(shared_ptr<storage_proxy> p,
            locator::effective_replication_map_ptr erm,
            db::consistency_level cl, db::write_type type,
            std::unique_ptr<mutation_holder> mh, inet_address_vector_replica_set targets, tracing::trace_state_ptr trace_state,
            storage_proxy::write_stats& stats, service_permit permit, db::per_partition_rate_limit::info rate_limit_info, size_t pending_endpoints = 0,
            inet_address_vector_topology_change dead_endpoints = {}, is_cancellable cancellable = is_cancellable::no)
            : _id(p->get_next_response_id()), _proxy(std::move(p))
            , _effective_replication_map_ptr(std::move(erm))
            , _trace_state(trace_state), _cl(cl), _type(type), _mutation_holder(std::move(mh)), _targets(std::move(targets)),
              _dead_endpoints(std::move(dead_endpoints)), _stats(stats), _expire_timer([this] { timeout_cb(); }), _permit(std::move(permit)),
              _rate_limit_info(rate_limit_info) {
        // original comment from cassandra:
        // during bootstrap, include pending endpoints in the count
        // or we may fail the consistency level guarantees (see #833, #8058)
        _total_block_for = db::block_for(*_effective_replication_map_ptr, _cl) + pending_endpoints;
        ++_stats.writes;

        if (cancellable) {
            register_cancellable();
        }
    }

pending_endpoints is counted here:

datacenter_write_response_handler(shared_ptr<storage_proxy> p,
            locator::effective_replication_map_ptr ermp,
            db::consistency_level cl, db::write_type type,
            std::unique_ptr<mutation_holder> mh, inet_address_vector_replica_set targets,
            const inet_address_vector_topology_change& pending_endpoints, inet_address_vector_topology_change dead_endpoints, tracing::trace_state_ptr tr_state,
            storage_proxy::write_stats& stats, service_permit permit, db::per_partition_rate_limit::info rate_limit_info) :
                abstract_write_response_handler(p, ermp, cl, type, std::move(mh), // can't move ermp, it's used below
                        std::move(targets), std::move(tr_state), stats, std::move(permit), rate_limit_info,
                        ermp->get_topology().count_local_endpoints(pending_endpoints), std::move(dead_endpoints)) {
        _total_endpoints = _effective_replication_map_ptr->get_topology().count_local_endpoints(_targets);
    }

as ermp->get_topology().count_local_endpoints(pending_endpoints).
Consistency level is LOCAL_ONE, so db::block_for(*_effective_replication_map_ptr, _cl) is 1.
The vector pending_endpoints passed to the datacenter_write_response_handler constructor has only one entry: the second node. It is passed to the filter that checks if it is in the same DC. But the DC of this node is UNKNOWN_DC, returned here:

const endpoint_dc_rack& topology::get_location(const inet_address& ep) const {
    if (auto node = find_node(ep)) {
        return node->dc_rack();
    }
    // We should do the following check after lookup in nodes.
    // In tests, there may be no config for local node, so fall back to get_location()
    // only if no mapping is found. Otherwise, get_location() will return empty location
    // from config or random node, neither of which is correct.
    if (ep == _cfg.this_endpoint) {
        return get_location();
    }
    // FIXME -- this shouldn't happen. After topology is stable and is
    // correctly populated with endpoints, this should be replaced with
    // on_internal_error()
    tlogger.warn("Requested location for node {} not in topology. backtrace {}", ep, lazy_backtrace());
    return endpoint_dc_rack::default_location;
}

Filter discards this entry and the number of pending endpoints passed to the abstract_write_response_handler constructor is 0.
Thus _total_block_for is 1.

_total_endpoints: 1 for a similar reason as _total_block_for is 1 (filter discards the second node when counting endpoints in the same DC).

1+0 is not greater than 1, so these writes pass (and print warnings in topology::get_location)

bhalevy · 2024-05-23T06:30:26Z

We now also see a similar symptom in e.g. https://jenkins.scylladb.com/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/67/artifact/logs-full.release.003/1716082048650_replace_address_test.py%3A%3ATestReplaceAddress%3A%3Atest_replace_node_diff_ip_take_write%5Buse_endpoint-rbo_enabled%5D/node3.log (but it's happens frequently with other variants of this dtest as well)

FWIW, we ran a bisect job to try to pinpoint when this started:
https://jenkins.scylladb.com/job/scylla-master/job/bisect/92/artifact/gitBisectLog.txt

git bisect start
# status: waiting for both good and bad commits
# good: [59b40484c8d3353091fb236ef14527caf6e213e4] Update seastar submodule
git bisect good 59b40484c8d3353091fb236ef14527caf6e213e4
# status: waiting for bad commit, 1 good commit known
# bad: [bc596a3e76da39ec4c3d64495150e853c6bb16a2] pull_request_template: clearify the template and remove checkbox verification
git bisect bad bc596a3e76da39ec4c3d64495150e853c6bb16a2
# bad: [67bbaad62e8eb2e3f31ba4d110585dcd8a5e6220] tasks: use default task_ttl in scylla.yaml
git bisect bad 67bbaad62e8eb2e3f31ba4d110585dcd8a5e6220
# bad: [329b135b5e4a12df9547f7638128552ae762e913] Merge 'chunked_vector: fix use after free in emplace back' from Benny Halevy
git bisect bad 329b135b5e4a12df9547f7638128552ae762e913
# bad: [7dc0d068c04b0a8b7b4f58c594765f462a5317ef] cql3/statements: pass `query_options` to `prepare_schema_mutations()`
git bisect bad 7dc0d068c04b0a8b7b4f58c594765f462a5317ef
# bad: [9e8805bb497bdf349047c6cb4668cd3893972760] repair, transport: s/get0()/get()/
git bisect bad 9e8805bb497bdf349047c6cb4668cd3893972760
# bad: [4445ee9a55c8e2fcf1b509dcb1c2cd67ed4bbec9] Merge 'install-dependencies.sh: add more dependencies for debian' from Kefu Chai
git bisect bad 4445ee9a55c8e2fcf1b509dcb1c2cd67ed4bbec9
# bad: [275ed9a9bc01fdcdddfd25ecb8689a23e4f355cb] replica/mutation_dump: create_underlying_mutation_sources(): remove false move
git bisect bad 275ed9a9bc01fdcdddfd25ecb8689a23e4f355cb
# bad: [06e6ed09ed1c5e33bffd940cb543a5762f87f537] gossiper: disable status check for endpoints in raft mode
git bisect bad 06e6ed09ed1c5e33bffd940cb543a5762f87f537
# bad: [040c6ca0c13a14f05b3a1244100e123699aa0c5e] topology coordinator: drop unused structure
git bisect bad 040c6ca0c13a14f05b3a1244100e123699aa0c5e
# bad: [d0a00f348980ce75ecc0b1afb4151b2c8b9923bc] storage_service: yield in get_system_mutations
git bisect bad d0a00f348980ce75ecc0b1afb4151b2c8b9923bc
# first bad commit: [d0a00f348980ce75ecc0b1afb4151b2c8b9923bc] storage_service: yield in get_system_mutations

Edit: I'm not sure that the good commit is indeed good.
Need to verify that.

gleb-cloudius · 2024-05-23T06:37:33Z

@bhalevy I know exactly why it happens in dtest and it is a different, tabets related, issue. Open it please. The problem is that tablets code adds some left to the topology and it brings havoc to the universe.

gleb-cloudius · 2024-05-23T06:45:56Z

@bhalevy I know exactly why it happens in dtest and it is a different, tabets related, issue. Open it please. The problem is that tablets code adds some left to the topology and it brings havoc to the universe.

Well I do not know exactly what happens since I did not investigate why a left node is returned as a replica, but I do know it is a different issue from this one.

bhalevy · 2024-05-23T06:56:15Z

@bhalevy I know exactly why it happens in dtest and it is a different, tabets related, issue. Open it please. The problem is that tablets code adds some left to the topology and it brings havoc to the universe.

Well I do not know exactly what happens since I did not investigate why a left node is returned as a replica, but I do know it is a different issue from this one.

#18843

gleb-cloudius · 2024-05-23T08:34:20Z

Well I do not know exactly what happens since I did not investigate why a left node is returned as a replica

🤦 it is returned because the table is governed by tablets now and the left node is still a replica there. This is why we are adding it to the topology in the first place.

kbr-scylla · 2024-05-23T09:03:16Z

Moving the release blocker label to #18843

If there is no mapping from host id to ip while a node is in bootstrap state there is no point adding it to pending endpoint since write handler will not be able to map it back to host id anyway. If the transition sate requires double writes though we still want to fail. In case the state is write_both_read_old we fail the barrier that will cause topology operation to rollback and in case of write_both_read_new we assert but this should not happen since the mapping is persisted by this point (or we failed in write_both_read_old state). Fixes: scylladb#18676

mykaul · 2024-05-30T06:41:39Z

I'm not sure what the status of the issue here, but it should be moved to 6.1, perhaps backported to 6.0.x.

kbr-scylla · 2024-06-04T11:10:24Z

Updated milestone to 6.0.1

margdoc changed the title ~~Distrupted bootstrap breaks host ID -> IP mappings~~ Distrupted bootstrap can break host ID -> IP mappings May 14, 2024

bhalevy added the triage/master Looking for assignee label May 15, 2024

mykaul assigned piodul May 15, 2024

kbr-scylla assigned kostja, margdoc and kbr-scylla May 15, 2024

kbr-scylla changed the title ~~Distrupted bootstrap can break host ID -> IP mappings~~ Disrupted bootstrap can break host ID -> IP mappings May 17, 2024

kbr-scylla mentioned this issue May 17, 2024

raft topology: get rid of IP mappings dependency where they may be unobtainable #17169

Closed

kbr-scylla added the area/topology changes label May 17, 2024

kbr-scylla added the status/release blocker Preventing from a release to be promoted label May 17, 2024

bhalevy mentioned this issue May 23, 2024

[tablets] test_replace_node_*_ip_take_write dtests fail with: No mapping for :: in the passed effective replication map #18843

Open

kbr-scylla removed triage/master Looking for assignee status/release blocker Preventing from a release to be promoted labels May 23, 2024

kbr-scylla added this to the 6.0 milestone May 23, 2024

patjed41 mentioned this issue May 23, 2024

test: test_topology_ops: adapt to tablets #18707

Draft

1 task

This was referenced May 27, 2024

mv: handle different ERMs for base and view table #18816

Closed

Fail bootstrap if ip mapping is missing during double write stage #18927

Open

kbr-scylla modified the milestones: 6.0, 6.0.1 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disrupted bootstrap can break host ID -> IP mappings #18676

Disrupted bootstrap can break host ID -> IP mappings #18676

margdoc commented May 14, 2024 •

edited

bhalevy commented May 15, 2024

bhalevy commented May 15, 2024

mykaul commented May 15, 2024

patjed41 commented May 15, 2024

bhalevy commented May 15, 2024

kbr-scylla commented May 15, 2024 •

edited

kbr-scylla commented May 15, 2024

kbr-scylla commented May 17, 2024

bhalevy commented May 19, 2024

bhalevy commented May 19, 2024

gleb-cloudius commented May 20, 2024

bhalevy commented May 20, 2024

margdoc commented May 20, 2024

margdoc commented May 20, 2024 •

edited

bhalevy commented May 23, 2024 •

edited

gleb-cloudius commented May 23, 2024

gleb-cloudius commented May 23, 2024 •

edited

bhalevy commented May 23, 2024

gleb-cloudius commented May 23, 2024

kbr-scylla commented May 23, 2024

mykaul commented May 30, 2024

kbr-scylla commented Jun 4, 2024

Disrupted bootstrap can break host ID -> IP mappings #18676

Disrupted bootstrap can break host ID -> IP mappings #18676

Comments

margdoc commented May 14, 2024 • edited

bhalevy commented May 15, 2024

bhalevy commented May 15, 2024

mykaul commented May 15, 2024

patjed41 commented May 15, 2024

bhalevy commented May 15, 2024

kbr-scylla commented May 15, 2024 • edited

kbr-scylla commented May 15, 2024

kbr-scylla commented May 17, 2024

bhalevy commented May 19, 2024

bhalevy commented May 19, 2024

gleb-cloudius commented May 20, 2024

bhalevy commented May 20, 2024

margdoc commented May 20, 2024

margdoc commented May 20, 2024 • edited

bhalevy commented May 23, 2024 • edited

gleb-cloudius commented May 23, 2024

gleb-cloudius commented May 23, 2024 • edited

bhalevy commented May 23, 2024

gleb-cloudius commented May 23, 2024

kbr-scylla commented May 23, 2024

mykaul commented May 30, 2024

kbr-scylla commented Jun 4, 2024

margdoc commented May 14, 2024 •

edited

kbr-scylla commented May 15, 2024 •

edited

margdoc commented May 20, 2024 •

edited

bhalevy commented May 23, 2024 •

edited

gleb-cloudius commented May 23, 2024 •

edited