Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes a possible race condition in AutoRestartTrick #1002

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ivg
Copy link

@ivg ivg commented Aug 10, 2023

Just a long shot for a failure observed on #998. My hypothesis is that when we stop ProcessWatcher before we restart the process manually, we don't yield to it and immediately kill the process. Next, when the ProcessWatcher thread is woken up, we have to conditions ready - the popen_obj and stopped_event, see the corresponding code,

 while True: if self.popen_obj.poll() is not None: 
     break 
  if self.stopped_event.wait(timeout=0.1): 
     return 

And despite that stopped_event is set, we first check for popen_obj and trigger the process restart.

We can also make the ProcessWatcher logic more robust, by checking if we are stopped before calling the termination callback, e.g.,

        try:
            if not self.stopped_event.is_set():
                self.process_termination_callback()
        except Exception:
            logger.exception("Error calling process termination callback")

I am not 100% sure about that, as I don't really know what semantics is expected from ProcessWatcher by other users. But at least the AutoRestarter expects this semantics - i.e., a watcher shall not call any events after it was stopped.

Just a long shot for a failure observed on gorakhargosh#998. My hypothesis is that
when we stop ProcessWatcher before we restart the process manually, we
don't yield to it and immediately kill the process. Next, when the
ProcessWatcher thread is woken up, we have to conditions ready - the
popen_obj and stopped_event, see the corresponding code, ``` while
True: if self.popen_obj.poll() is not None: break if
self.stopped_event.wait(timeout=0.1): return ```

And desipte that `stopped_event` is set, we first check for
`popen_obj` and trigger the process restart.

We can also make the ProcessWatcher logic more robust, by checking if
we are stopped before calling the termination callback, e.g.,

```
        try:
            if not self.stopped_event.is_set():
                self.process_termination_callback()
        except Exception:
            logger.exception("Error calling process termination callback")
```

I am not 100% sure about that, as I don't really know what semantics
is expected from ProcessWatcher by other users. But at least the
AutoRestarter expects this semantics - i.e., a watcher shall not call
any events after it was stopped.
i.e., don't send events if stopped
@ivg
Copy link
Author

ivg commented Aug 10, 2023

Okay, the first option didn't work, so I tried the second and the tests passed, at least on Linux, not sure what's going on with macOS, but it looks like it is a known issue, correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant