Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node should not be ready when klog flush deamon in kubelet is block in fsync #124016

Closed
divanodestiny opened this issue Mar 21, 2024 · 4 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@divanodestiny
Copy link

What happened?

  1. some issues cause a part of fsync syscalls to persist in blocking
  2. klog flush deamon hold the lock, call flushAll and block in fsync syscall

// lockAndFlushAll is like flushAll but locks l.mu first.
func (l *loggingT) lockAndFlushAll() {
l.mu.Lock()
l.flushAll()
l.mu.Unlock()
}

  1. other goroutines in kubelet will be block when print log

    // output writes the data to the log files and releases the buffer.
    func (l *loggingT) output(s severity.Severity, logger *logWriter, buf *buffer.Buffer, depth int, file string, line int, alsoToStderr bool) {
    var isLocked = true
    l.mu.Lock()
    defer func() {
    if isLocked {
    // Unlock before returning in case that it wasn't done already.
    l.mu.Unlock()
    }
    }()

  2. kubelet can renew node lease normally because this goroutine do not print log

  3. node controller think this node is ready

What did you expect to happen?

node should be not ready

How can we reproduce it (as minimally and precisely as possible)?

maybe use ptrace to block fsync?

Anything else we need to know?

No response

Kubernetes version

1.26.0

$ kubectl version
# paste output here

Cloud provider

none

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@divanodestiny divanodestiny added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 21, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 21, 2024
@T-Lakshmi
Copy link

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 21, 2024
@bart0sh bart0sh added this to Triage in SIG Node CI/Test Board Mar 24, 2024
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Mar 27, 2024
@SergeyKanzhelev SergeyKanzhelev removed this from Triage in SIG Node CI/Test Board Apr 3, 2024
@yuzhiquan
Copy link
Member

yuzhiquan commented Apr 9, 2024

I think this related to this issue, maybe we can wait for the solution on klog side.
Feel free to reopen this.
/close

@k8s-ci-robot
Copy link
Contributor

@yuzhiquan: Closing this issue.

In response to this:

I think this related to this issue, maybe we can wait for the solution on klog side.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node Bugs automation moved this from Triage to Done Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Development

No branches or pull requests

4 participants