-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schedule tasks by timestamp #18508
Schedule tasks by timestamp #18508
Conversation
If the scheduling service is not yet up-and-running and it is used to schedule a task, the task is simply ignored and nothing is scheduled. While that could make sense in some cases, it could also hide a bug. It therefore makes sense to log such cases at warn level.
The implementation of the TimerSubscription is specific to delayed timers, where the scheduling in ActorTimerQueue determines the actual timestamp to use. I want to add a new TimerSubscription implementation that isn't based on delays but simply uses a static timestamp as deadline. Initially, I looked into replacing the delay code of the TimerSubscription, but this isn't possible as it also allows recurring timers using the same delay for each trigger. Therefore, I chose to extract an interface, allowing me to more re-implement the class differently without affecting existing behavior.
The deadline field of this class is actually a delay rather than a timestamp. To clarify that the deadline can be calculated from it, it makes sense to rename it.
The ActorTimerQueue was specific to DelayedTimerSubscription because when scheduling, it would consider deadline to be a delay. However, if we want to add other TimerSubscription implementations that are not delay based, we'll need the TimerSubscription to be able to determine the exact deadline itself. In order to do so DelayedTimerSubscription needs the actor clock to determine the deadline. It's okay to pass the clock as a parameter, even though other implementation may not need it. In the end, we're just asking for the moment it should be run.
This adds a new feature to the ActorControl: scheduling a timer task at a specific timestamp. Note that due to the way the ActorTimerQueue is implemented (DeadlineTimerWheel), there is no guarantee that the task is run AT the timestamp.
It's no longer a rule, but I didn't want to clutter the previous commit with this rename.
I want to add more tests, and keep the class a bit organized
Inline the scheduleTimer methods. It only deduplicated a single line. I'd rather remove the indirection.
Mostly just a copy of the DelayedTimerTests, but adjusted for running at a timestamp.
Adds the ability to the stream-platform's ProcessingScheduleService to run tasks at a timestamp. The execution is not guaranteed to happen exactly at, but atleast not before the timestamp. Inline with other task scheduling abilities of the ProcessingScheduleService, this adds a runAtAsync method as well. It functions similarly to the other async task scheduling capabilities.
This is to unflake tests that expect to trigger a timer through the due date checker after traveling through time. This was unstable because the due date checker was scheduled with a delay, where this delay is used to determine an actual deadline at a later point in time (during scheduling of the TimerSubscription by the actor). The unstability arose because the task can be scheduled async which means that the deadline was calculated async as well, so it could happen that the deadline was determined after the test had already travelled through time. This problem is resolved by using the new runAt method, which allows scheduling a task at a specific timestamp. When the scheduling is postponed until after the time traveling, the task will trigger immediately if the timestamp is now in the past.
I haven't reviewed the PR yet sorry @korthout just to get it right and playing a bit devils advocate here.
|
All good @Zelldon. I'm also interested in @oleschoenburg's perspective when they return. Let me try to answer your questions in the meantime:
This is meant to resolve a testing issue. However, the change does affect production behavior. Since the actor scheduler was previously unable to schedule tasks at a specific timestamp, a small delay was always added on top of the requested delay when scheduling occurred async (i.e., the time between requesting the scheduling and executing this request). Note that async scheduling of processing tasks is the default behavior. While this delay is typically tiny or unnoticeable, it could lead to additional delays in triggering timer events. Using the timestamp of a timer event is more accurate than calculating a delay from it and then postponing when that delay is applied to determine a timestamp again. Using a timestamp leads to more accurate timer event triggering.
See above answer.
Yes. But this means the test also takes more time to run when it encounters the time travel race condition. In the same way, the tests can be made to pass by reducing the duration of the timer events in the test cases. This has the same downside. An additional downside for both cases is that we have to change our test cases specifically for this implementation limitation. I'd rather ensure we have tests for short and long-living timers as they both are reasonable use cases.
Absolutely, if the processing scheduling tasks are not scheduled asynchronously, this problem doesn't exist. However, on production we do schedule processing tasks asynchronously, so we may want to also make that available in our tests regardless. I am in favor of decoupling the engine tests from the rest of the platform and immediately taking a JUnit5 approach. That does not stop us from adding this functionality as well. I see value in this pull request because it removes an entire class of flaky tests while also making timer event triggers more accurate on production.
See above answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thanks for tackling this 🥇 Before a do a full review, I'd like to suggest three things:
🔧 I understand that you split TimerSubscription
into Delayed
and Stamped
because recurring timers need to reschedule with the same delay again. I feel like that's a lot of code duplication and would suggest a slightly different approach where a single TimerSubscription
treats deadline
as a fixed point in time or a delay based on isRecurring
.
🔧 Even without the refactoring suggested above, I'd like to always use the Stamped
variant except for recurring timers. The old ActorControl#schedule
method can take care of converting a delay into a fixed timestamp.
🔧 Remove the old runDelayed
methods from ProcessingScheduleService
. Let's try and keep that interface small.
Thanks @oleschoenburg 👍 I think all of these are valuable but also a lot of extra effort. I chose not to change the behavior of parts I don't know. Therefore, I added duplication between the two types of timer subscriptions. If you think this can all be unduplicated again, then I'd hope that someone a bit closer to the scheduler can do this. I'd be willing to contribute to removing the My hope is that we can unflake these tests soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then let's merge it as is and maybe tackle my suggestions as follow-up 👍
Thanks @oleschoenburg 🙇 As this resolves flakiness, should we backport it? 🤔 |
I'd say we can at least try it out. |
This name is more fitting as we don't guarantee that the scheduled task is ran at the provided timestamp. The only guarantee we provide is that it is ran at some point at or after the timestamp.
8666086
to
176af26
Compare
This reverts commit 176af26. It's not necessary to be so explicit, as schedulers can never really achieve running at a specific timestamp.
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.2
git worktree add -d .worktree/backport-18508-to-stable/8.2 origin/stable/8.2
cd .worktree/backport-18508-to-stable/8.2
git switch --create backport-18508-to-stable/8.2
git cherry-pick -x baea47045d49041773604d5be3f814fbe00a7488 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8 |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.3
git worktree add -d .worktree/backport-18508-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-18508-to-stable/8.3
git switch --create backport-18508-to-stable/8.3
git cherry-pick -x baea47045d49041773604d5be3f814fbe00a7488 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8 |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin stable/8.4
git worktree add -d .worktree/backport-18508-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-18508-to-stable/8.4
git switch --create backport-18508-to-stable/8.4
git cherry-pick -x baea47045d49041773604d5be3f814fbe00a7488 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8 |
Successfully created backport PR for |
I'll retry the backport once #18860 is merged, to reduce the manual effort. |
/backport |
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-18508-to-stable/8.2
git worktree add --checkout .worktree/backport-18508-to-stable/8.2 origin/stable/8.2
cd .worktree/backport-18508-to-stable/8.2
git reset --hard HEAD^
git cherry-pick -x 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8
git push --force-with-lease |
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-18508-to-stable/8.3
git worktree add --checkout .worktree/backport-18508-to-stable/8.3 origin/stable/8.3
cd .worktree/backport-18508-to-stable/8.3
git reset --hard HEAD^
git cherry-pick -x 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8
git push --force-with-lease |
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-18508-to-stable/8.4
git worktree add --checkout .worktree/backport-18508-to-stable/8.4 origin/stable/8.4
cd .worktree/backport-18508-to-stable/8.4
git reset --hard HEAD^
git cherry-pick -x 5d7066c73e1b47d583ba5cac741b95fc74d4a523 1bbd61ed54a4ed2d7fc1151cbcd26661b3d3acdf bf81f0bb7ce1fd785cb2441a8a41510ea71cc6cd 05a3b3fc763947b0dcadc7dae74aa2de40d64ae3 b88304339a41b06884a1d34a3377f1600bfc1ac6 96a818bc79fb11f74a2ee6f93ca5f3de1792f893 eefa7c7092b432fca96de7f6414c0c9c3abb6641 c1c47a6bfc3ab7dd2c6c3edc5e82c318285c8ae7 e0b8c00083ef98372b06dbcb1d2093fb15bb2bc5 a31d0e42fb8bac4594584347ea657133143cbe1e 41ac7ff200fe435b34e2794bb7601b197dd250c4 5a0e14339770a63780ed6ff61dd5e4d78b4984a6 176af26a00e91d690f30c25de00cbca67dd27a8a 893875c30427416cdbce3f82e208176306257da8
git push --force-with-lease |
Git push to origin failed for stable/8.5 with exitcode 1 |
Description
Unflake tests that rely on time travel to trigger timers.
This was unstable because the due date checker was scheduled with a delay, which is used to determine an actual deadline later during the actor's scheduling of the TimerSubscription. The instability arose because the task can be scheduled async, which means that the deadline was calculated async as well, so it could happen that the deadline was determined after the test had already traveled through time.
This problem is resolved by introducing a new
runAt
method, which allows scheduling a task at a specific timestamp. When the async scheduling is postponed until after the time traveling, the task will trigger immediately if the timestamp is now in the past.This changes the following modules:
schedule
introduces a newrunAt
method that can schedule Timer tasks (TimerSubscription) by timestampstream-platform
introduces a newrunAt
method for the engine to schedule tasks at a timestamp that can produce processing resultsengine
uses the newrunAt
method to (re)schedule the DueDateCheckerRelated issues
closes #17254
closes #10169
closes #18462