Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep the monitor exits from stopping when the watcher gets error #8195

Merged
merged 1 commit into from
May 20, 2024

Conversation

bitoku
Copy link
Contributor

@bitoku bitoku commented May 17, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently when the watcher in "exit monitor" gets an error, whole "exit monitor" stops and it is never restarted again.
In other words, if the watcher gets an error, cri-o continues to work but it can no longer watch exits.

Watcher errors (= errors in fsnotify) are supposed to be something recoverable, and therefore fsnotify doesn't stop if it gets an error (see the usage of sendError in the codes below).

https://github.com/fsnotify/fsnotify/blob/main/backend_inotify.go#L410-L573
https://github.com/fsnotify/fsnotify/blob/main/backend_windows.go#L483-L647

This PR keeps the monitor from stopping when the watcher gets error, and makes it keep monitoring.
It also fixes close of closed channel when cri-o shuts down. (#8031)

Which issue(s) this PR fixes:

Fixes #8031

Special notes for your reviewer:

Even if this PR is merged, there is slight possibility that some events are ignored when it gets an error although it will be able to much more events than the current behaviour.

Does this PR introduce a user-facing change?

Fix the bug that cri-o stops watching container exits after it gets an fsnotify error

@bitoku bitoku requested a review from mrunalp as a code owner May 17, 2024 01:14
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels May 17, 2024
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 17, 2024
Copy link
Contributor

openshift-ci bot commented May 17, 2024

Hi @bitoku. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented May 17, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 49.55%. Comparing base (d499a1e) to head (7fec7bf).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8195      +/-   ##
==========================================
+ Coverage   49.54%   49.55%   +0.01%     
==========================================
  Files         153      153              
  Lines       16961    16955       -6     
==========================================
- Hits         8403     8402       -1     
+ Misses       7511     7505       -6     
- Partials     1047     1048       +1     

@kwilczynski
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 17, 2024
@bitoku
Copy link
Contributor Author

bitoku commented May 17, 2024

/retest

server/server.go Outdated
@@ -781,11 +781,6 @@ func (s *Server) monitorExits(ctx context.Context, watcher *fsnotify.Watcher, do
go s.handleExit(ctx, event)
case err := <-watcher.Errors:
log.Debugf(ctx, "Watch error: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this Errorf so it is easier to see there's an issue?

@haircommander
Copy link
Member

one note, otherwise LGTM. good idea here @bitoku

Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
@kwilczynski
Copy link
Member

/approve
/lgtm

@kwilczynski
Copy link
Member

/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2024
@haircommander
Copy link
Member

/approve
/lgtm

Copy link
Contributor

openshift-ci bot commented May 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bitoku, haircommander, kwilczynski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2024
@bitoku
Copy link
Contributor Author

bitoku commented May 20, 2024

/test ci-cgroupv2-integration

@openshift-merge-bot openshift-merge-bot bot merged commit 5ab50a9 into cri-o:main May 20, 2024
70 of 71 checks passed
@kwilczynski
Copy link
Member

/cherry-pick release-1.30

@kwilczynski
Copy link
Member

/cherry-pick release-1.29

@openshift-cherrypick-robot

@kwilczynski: new pull request created: #8209

In response to this:

/cherry-pick release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot

@kwilczynski: new pull request created: #8210

In response to this:

/cherry-pick release-1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic when crio.service is stopped
4 participants