Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UV_RUN_ONCE regression after commit 6600954 #4260

Open
bnoordhuis opened this issue Dec 19, 2023 · 18 comments
Open

UV_RUN_ONCE regression after commit 6600954 #4260

bnoordhuis opened this issue Dec 19, 2023 · 18 comments
Labels

Comments

@bnoordhuis
Copy link
Member

Refs: 6600954

Here is how my code works in a nutshell:

  • There are 2 unref'd handles that check if there are promises to run (1 prepare, 1 check)
  • If there are any promises to run they start an idle handle, which is ref'd
  • Timers run on a uv_timer_t

After this change, timers run last, and thus my check handle hasn't run as the last thing, which is what I used to keep the loop alive by starting the idle handle. This means the loop could exit early, if new promises were created on a timer, since check handles no longer run afterwards.

Originally posted by @saghul in #3686 (comment)

@bnoordhuis bnoordhuis added the bug label Dec 19, 2023
@saghul
Copy link
Member

saghul commented Dec 19, 2023

Thanks for posting, I wasn't sure.

IMHO any change to the loop diagram is a no-no since it shows a change in behavior, which many applications have grown to expect.

@vtjnash
Copy link
Member

vtjnash commented Dec 19, 2023

I think originally this code was implemented on top of ev_check in libev, with the intent that check should run before IO callbacks and timers should run after IO. This was so that check handlers could correctly consume events from the poll handle before the IO callbacks (which might feed changes to those). The libuv version of this diagram has always been slightly incomplete and inaccurate however, since it ignores and combines several IO phases. It appeared that check was not intended for keeping the loop alive (in particular, close callbacks have always run after it, so any attempt to use it to keep the loop alive would have already been incorrect in the presence of close callbacks).

There also appears to be some discussion about how the library is intended to be used by the Perl Coro for
http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#THREADS_COROUTINES_CONTINUATIONS_QUE

And specifically also covers how to implement your use (calling it "Abusing an ev_check watcher for its side-effect"):
http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#Abusing_an_code_ev_check_code_watche

@saghul
Copy link
Member

saghul commented Dec 19, 2023

with the intent that check should run before IO callbacks and timers should run after IO.

Not really. It was defined to run after the polling phase.

Yes, there is no naming baggage, which was to some extent inherited from libev to be sure, but it is what it is. An interesting example is uv_idle_t, which has nothing to do with being idle. but since we promised to do v1.x forever, it is what it is.

The libuv version of this diagram has always been slightly incomplete and inaccurate however, since it ignores and combines several IO phases.

The main building blocks were correct, however.

t appeared that check was not intended for keeping the loop alive (in particular, close callbacks have always run after it, so any attempt to use it to keep the loop alive would have already been incorrect in the presence of close callbacks).

No, check was never meant to keep the loop alive. Check was defined to run last, which gave you a chance to do something to keep the loop alive. That something can be to start an idle handle for example.

While the PR didn't break the API or ABI it did break userland applications, which I think we shouldn't do. I don't think this was intentional, we can fix it, bit we first need to understand how we wanrt to do that.

@trevnorris
Copy link
Member

While the PR didn't break the API or ABI it did break userland applications, which I think we shouldn't do. I don't think this was intentional, we can fix it, bit we first need to understand how we wanrt to do that.

It wasn't intentional and I agree we shouldn't be breaking userland applications. We've always told users that they shouldn't write code that depends on event loop internals, but I guess that's not realistic in all cases.

After this change, timers run last, and thus my check handle hasn't run as the last thing, which is what I used to keep the loop alive by starting the idle handle. This means the loop could exit early

TBH I'm not following this. check handles still run, which should still create the idle handle, so the return from uv__loop_alive() should still be > 0, causing the loop to iterate one more time. Running timers shouldn't affect the idle handles and cause it to return rearly.

Unless somehow stop_flag is being set, but I doubt that's the case.

@saghul
Copy link
Member

saghul commented Dec 19, 2023

We've always told users that they shouldn't write code that depends on event loop internals, but I guess that's not realistic in all cases.

Arguably this is observable (and documented, in the diagram) behavior. IIRC we used to say that to Node land, not to those using libuv directly.

TBH I'm not following this. check handles still run, which should still create the idle handle, so the return from uv__loop_alive() should still be > 0, causing the loop to iterate one more time. Running timers shouldn't affect the idle handles and cause it to return rearly.

Let me try to rephrase:

With the previous behavior:

  • A timer is fired, which puts a promise in the queue
  • Check handle runs, checks the promise queue is not empty and starts an idle handle
  • Loop doesn't exit because there is an active ref'd handle

With the current one:

  • Check handle runs, there are no promises in the queue, does nothing
  • Timers run, put a promise in the queue
  • Loop exits because there is no active ref'd handle.

@trevnorris
Copy link
Member

IIRC we used to say that to Node land, not to those using libuv directly.

Makes sense, and thanks for the example. That clears things up.

When I made the change I thought the event loop diagram was only informative on how things are run, and didn't realize it was a guarantee on execution order (couldn't find any other documentation that check should always run last).

Hypothetically it would make the most sense for timers to run after the poll phase, but that causes timer_from_check test to fail. So might just be best to revert the change.

@bnoordhuis
Copy link
Member Author

Is the consensus here to revert?

@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2023

Or is it to move check to be last again?

@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2023

I am confused though, since actually 'close' runs last in nodejs, so using check for the stated use case sounds like it would always have been wrong

@saghul
Copy link
Member

saghul commented Dec 22, 2023

Please take a look at #4260 (comment) I think I did a better job explaining the situation there.

Close callbacks don't have a relevance here.

@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2023

I may be missing something from the context, but why are promises that were started by close events (or one of the other run-once modes that nodejs may use at times) not relevant to correct uses of the framework? Shouldn't the code need to immediately ref the idle callback as soon as the promise is created, or otherwise will risk skipping the check and idle callbacks entirely in a variety of existing scenarios?

@trevnorris
Copy link
Member

trevnorris commented Dec 22, 2023

I personally think that it would be better to move running check handles and closing handles after timers handles, but unfortunately timer_from_check fails because it expects timers to run after check.

I find this slightly confusing since if the old diagram was literal then the test should have failed already. The test asserts that the check handle runs before the timer handle. Which is a contradiction to what @saghul said was expected. So there's a discrepancy in what was documented and what is expected.

Here's the diff on linux to make sure I'm clear about what I mean:

diff --git a/src/unix/core.c b/src/unix/core.c
index 25c5181f..0e923ae8 100644
--- a/src/unix/core.c
+++ b/src/unix/core.c
@@ -460,5 +460,2 @@ int uv_run(uv_loop_t* loop, uv_run_mode mode) {
 
-    uv__run_check(loop);
-    uv__run_closing_handles(loop);
-
     uv__update_time(loop);
@@ -466,2 +463,5 @@ int uv_run(uv_loop_t* loop, uv_run_mode mode) {
 
+    uv__run_check(loop);
+    uv__run_closing_handles(loop);
+
     r = uv__loop_alive(loop);

@saghul
Copy link
Member

saghul commented Dec 22, 2023

I personally think that it would be better to move running check handles and closing handles after timers handles, but unfortunately timer_from_check fails because it expects timers to run after check.

Not quite. It expects the check cb to be called before the timer fires because it's not due. Check here for example, in an older release:

uv__io_poll(loop, timeout);

We poll for the given timeout, then we run check handles, and since the loop was ran with UV_RUN_DEFAULT, we go back to the top and run the timers.

@saghul
Copy link
Member

saghul commented Dec 22, 2023

I may be missing something from the context, but why are promises that were started by close events (or one of the other run-once modes that nodejs may use at times) not relevant to correct uses of the framework? Shouldn't the code need to immediately ref the idle callback as soon as the promise is created, or otherwise will risk skipping the check and idle callbacks entirely in a variety of existing scenarios?

Note I'm nor using Node here. It's a different app, but it so happens to also be a JavaScript runtime :-)

In my case, close handles don't run any JS code, they just cleanup, so it's not possible for them to enqueue any work / promises.

At this point I do have a workaround that is not ugly and does the job, but I raised the problem because it's very subtle and I think it could be hard to find in a large app for someone not intiamtely familiar with libuv's internals.

I wrote my code based on my understanding of the guarantees libuv provides, which I documented in that diagram 9 years ago.

For the past 9 years that has held true, until now.

@trevnorris
Copy link
Member

It expects the check cb to be called before the timer fires because it's not due.

The test showcases a hypothetical race condition. Take the following change:

diff --git a/test/test-timer-from-check.c b/test/test-timer-from-check.c
index e5f5cb2f..e50c3115 100644
--- a/test/test-timer-from-check.c
+++ b/test/test-timer-from-check.c
@@ -69,2 +69,3 @@ TEST_IMPL(timer_from_check) {
   ASSERT_OK(uv_timer_start(&timer_handle, timer_cb, 50, 0));
+  uv_sleep(100);
   ASSERT_OK(uv_run(uv_default_loop(), UV_RUN_DEFAULT));

This snippet causes the test to fail both before and after the change. The process being artificially hung up and preventing timers from running before reaching the timeout is unlikely to happen, but it's more about the mental modal of how the event loop is supposed to work. It's a demonstration of why I thought timer execution placement within the loop was only informative and not a guarantee. Because timer execution is unreliable by nature.

This might be more of an argument to support why I'd prefer the change I suggested above, and to change the test, but seems that might open another can of worms.

@bnoordhuis
Copy link
Member Author

Did we reach a conclusion here?

@saghul
Copy link
Member

saghul commented Jan 8, 2024

I'm somewhere in the neighbourhood of -0.5 on keeping the existing behavior. Maybe not many would run into problems, it's hard to tell.

I wonder if internally separating uv__run into some uv__run(void) and uv__run_once(int mode) could help solve the initial issue. There would be a bit of duplication, but it may make things simpler?

I could maybe take crack at that this week or the next.

@trevnorris
Copy link
Member

IMO we should use the change from #4260 (comment), which would fix the problem @saghul is having, and we change the test timer_from_check. Because I believe that test is technically a race condition anyways. Also, it does make sense to make sure the closing handles are the last things to run in the for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants