Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High lowres clock drift > 1s in debug mode #2238

Open
nvartolomei opened this issue May 11, 2024 · 4 comments
Open

High lowres clock drift > 1s in debug mode #2238

nvartolomei opened this issue May 11, 2024 · 4 comments

Comments

@nvartolomei
Copy link
Contributor

In redpanda we have a test which began failing very often in debug mode since latest seastar upgrade (CPU is busy with other processes too)

The test gets the current time using the lowres clock and sets a deadline +1 second for establishing a tcp connection. Because of high lowres clock drift, although the time it takes to run the tasks is in milliseconds we do hit the 1s deadline.

I believe this change in behavior is caused by this reactor change #2007.

We preempt every 64 calls to scheduler_need_preempt (every 64 tasks?) under the argument that "latency is not important in debug mode" but that seems to be too much in some cases which use the lowres clock since it is only updated when we break out of the reactor loop.

While we are guilty to have some reactor stalls in this testing code ~100ms, I don't believe that should be an excuse to let lowres clock drift to 1s+. If we were in a seastar::loop with each iteration taking 25ms (stall warning threshold), 64 calls are already 1.6s.

I see a few possible options:

  • Switch to std::chrono clocks;
  • Change user code to have very lax expectations on lowres clock everywhere even though it is advertised to have ~task quota granularity;
  • Change how lowres clock is updated in debug mode;
  • Change the approach introduced in reactor: poll less frequently in debug mode #2007 to break out of the reactor task queue execution if any of the following conditions are true:
    • 64 (or any other N) calls to scheduler_need_preempt
    • need_preempt() == true as it is available in release mode (aka np.head != np.tail) so that it respects the task quota even
  • ???

Appreciate any thoughts or ideas on the above 🙏

@nvartolomei
Copy link
Contributor Author

Maybe something like

diff --git a/include/seastar/core/preempt.hh b/include/seastar/core/preempt.hh
index 537be5b5..21882aa9 100644
--- a/include/seastar/core/preempt.hh
+++ b/include/seastar/core/preempt.hh
@@ -49,11 +49,9 @@ inline const preemption_monitor*& get_need_preempt_var() {

 void set_need_preempt_var(const preemption_monitor* pm);

-}
-
-SEASTAR_MODULE_EXPORT
-inline bool need_preempt() noexcept {
-#ifndef SEASTAR_DEBUG
+inline
+bool
+monitor_need_preempt() {
     // prevent compiler from eliminating loads in a loop
     std::atomic_signal_fence(std::memory_order_seq_cst);
     auto np = internal::get_need_preempt_var();
@@ -64,6 +62,14 @@ inline bool need_preempt() noexcept {
     // Possible optimization: read head and tail in a single 64-bit load,
     // and find a funky way to compare the two 32-bit halves.
     return __builtin_expect(head != tail, false);
+}
+
+}
+
+SEASTAR_MODULE_EXPORT
+inline bool need_preempt() noexcept {
+#ifndef SEASTAR_DEBUG
+    return internal::monitor_need_preempt();
 #else
     return true;
 #endif
@@ -71,8 +77,6 @@ inline bool need_preempt() noexcept {

 namespace internal {

-
-
 // Same as need_preempt(), but for the scheduler's use. Outside debug
 // mode they have the same meaning - the task quota expired and we need
 // to check for I/O.
@@ -88,7 +92,7 @@ scheduler_need_preempt() {
     // mode, run some random-but-bounded number of tasks instead. Latency
     // will be high if those tasks are slow, but this is debug mode anyway.
     static thread_local unsigned counter = 0;
-    return ++counter % 64 == 0;
+    return ++counter % 64 == 0 || monitor_need_preempt();
 #endif
 }

The return ++counter % 64 == 0 || monitor_need_preempt(); introduces some additional non-determinism for debug mode which shouldn't hurt.

nvartolomei added a commit to nvartolomei/redpanda that referenced this issue May 11, 2024
Adding a second takes ~1ms. Adding multiple of them in a single batch
can stall the reactor for a while. This, together with a recent seastar
change[^1] caused some timeouts to fire very frequently[^2].

Fix the test by breaking the batch passed to add_segments into smaller
segments so that we execute more finer grained tasks and yield more
often to the scheduler.

[^1]: scylladb/seastar#2238
[^2]: redpanda-data#13275
@avikivity
Copy link
Member

Reasonable. Does debug more default to 10ms task quotas? Or am I misremembering?

Also your code should rely on need_preempt() rather than doing its own batching.

nvartolomei added a commit to nvartolomei/redpanda that referenced this issue May 13, 2024
Adding a segment takes ~1ms. Adding multiple segments in a single batch
can stall the reactor for a while. This, together with a recent seastar
change[^1] caused some timeouts to fire very frequently[^2].

Fix the problematic test by splitting the batch passed to add_segments
so that we execute more finer grained tasks and yield more often to the
scheduler.

[^1]: scylladb/seastar#2238
[^2]: redpanda-data#13275
@nvartolomei
Copy link
Contributor Author

Reasonable. Does debug more default to 10ms task quotas? Or am I misremembering?

On a quick glance I don't see any code in seastar that would change the task quota in debug mode.

I see only these 2 references:

  1. https://github.com/redpanda-data/seastar/blob/d1ce6f06f86ca9c2db9fa5ee4b90b39a552f14c8/src/core/reactor.cc#L3982
  2. https://github.com/redpanda-data/seastar/blob/aa1c8b30faf4ffe0e62e45de87060d6dee9574bc/include/seastar/core/app-template.hh#L61

--

Shall I submit a PR with the patch from the comment above or we have better suggestions?

@avikivity
Copy link
Member

Reasonable. Does debug more default to 10ms task quotas? Or am I misremembering?

On a quick glance I don't see any code in seastar that would change the task quota in debug mode.

I see only these 2 references:

  1. https://github.com/redpanda-data/seastar/blob/d1ce6f06f86ca9c2db9fa5ee4b90b39a552f14c8/src/core/reactor.cc#L3982
  2. https://github.com/redpanda-data/seastar/blob/aa1c8b30faf4ffe0e62e45de87060d6dee9574bc/include/seastar/core/app-template.hh#L61

--

Ah, we have this in ScyllaDB:

main.cc-To start the scylla server proper, simply invoke as: scylla server (or just scylla).
main.cc-)";
main.cc:    app_cfg.default_task_quota = 500us;
main.cc-#ifdef DEBUG
main.cc-    // Increase the task quota to improve work:poll ratio in slow debug mode.
main.cc:    app_cfg.default_task_quota = 5ms;
main.cc-#endif
main.cc-    app_cfg.auto_handle_sigint_sigterm = false;
main.cc-    app_cfg.max_networking_aio_io_control_blocks = 50000;
main.cc-    // We need to have the entire app config to run the app, but we need to
main.cc-    // run the app to read the config file with UDF specific options so that

Shall I submit a PR with the patch from the comment above or we have better suggestions?

I don't have a better suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants