Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic when crio.service is stopped #8031

Closed
Jeansen opened this issue Apr 17, 2024 · 6 comments · Fixed by #8195
Closed

panic when crio.service is stopped #8031

Jeansen opened this issue Apr 17, 2024 · 6 comments · Fixed by #8195
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Jeansen
Copy link

Jeansen commented Apr 17, 2024

What happened?

Stopping crio.service show a panic in its logs.

What did you expect to happen?

A clean shutdown.

How can we reproduce it (as minimally and precisely as possible)?

systemct stop crio.service

Anything else we need to know?

Apr 17 15:59:50 worker3-k8s.lan systemd[1]: Stopping crio.service - Container Runtime Interface for OCI (CRI-O)...
Apr 17 15:59:50 worker3-k8s.lan crio[714]: panic: close of closed channel
Apr 17 15:59:50 worker3-k8s.lan crio[714]: goroutine 1 [running]:
Apr 17 15:59:50 worker3-k8s.lan crio[714]: panic({0x1e23060?, 0x243afc0?})
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/panic.go:1017 +0x3ac fp=0xc000a73570 sp=0xc000a734c0 pc=0x44086c
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.closechan(0xc00070eb40)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/chan.go:365 +0x3e5 fp=0xc000a735c8 sp=0xc000a73570 pc=0x40ce45
Apr 17 15:59:50 worker3-k8s.lan crio[714]: github.com/cri-o/cri-o/server.(*Server).Shutdown(0xc000209000, {0xc00053b6bc?, 0xc0004e00f0?})
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/cri-o/cri-o/server/server.go:363 +0x1a5 fp=0xc000a73610 sp=0xc000a735c8 pc=0x19c3825
Apr 17 15:59:50 worker3-k8s.lan crio[714]: main.main.func2(0xc00022d9c0)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/cri-o/cri-o/cmd/crio/main.go:421 +0x1856 fp=0xc000a73b78 sp=0xc000a73610 pc=0x19ec356
Apr 17 15:59:50 worker3-k8s.lan crio[714]: github.com/urfave/cli/v2.(*Command).Run(0xc0001a22c0, 0xc00022d9c0, {0xc00012c060, 0x1, 0x1})
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/urfave/cli/v2@v2.26.0/command.go:277 +0x9d8 fp=0xc000a73e08 sp=0xc000a73b78 pc=0x838e98
Apr 17 15:59:50 worker3-k8s.lan crio[714]: github.com/urfave/cli/v2.(*App).RunContext(0xc00046e000, {0x245d240?, 0x3519300}, {0xc00012c060, 0x1, 0x1})
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/urfave/cli/v2@v2.26.0/app.go:335 +0x5db fp=0xc000a73e70 sp=0xc000a73e08 pc=0x83515b
Apr 17 15:59:50 worker3-k8s.lan crio[714]: github.com/urfave/cli/v2.(*App).Run(...)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/urfave/cli/v2@v2.26.0/app.go:309
Apr 17 15:59:50 worker3-k8s.lan crio[714]: main.main()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         github.com/cri-o/cri-o/cmd/crio/main.go:458 +0x65b fp=0xc000a73f40 sp=0xc000a73e70 pc=0x19eaa7b
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.main()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:267 +0x2bb fp=0xc000a73fe0 sp=0xc000a73f40 pc=0x4440fb
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goexit()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/asm_amd64.s:1650 +0x1 fp=0xc000a73fe8 sp=0xc000a73fe0 pc=0x479041
Apr 17 15:59:50 worker3-k8s.lan crio[714]: goroutine 2 [force gc (idle)]:
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:398 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44456e
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goparkunlock(...)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:404
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.forcegchelper()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:322 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x4443d3
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goexit()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/asm_amd64.s:1650 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x479041
Apr 17 15:59:50 worker3-k8s.lan crio[714]: created by runtime.init.6 in goroutine 1
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:310 +0x1a
Apr 17 15:59:50 worker3-k8s.lan crio[714]: goroutine 3 [GC sweep wait]:
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gopark(0x34df701?, 0x0?, 0x0?, 0x0?, 0x0?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:398 +0xce fp=0xc000085778 sp=0xc000085758 pc=0x44456e
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goparkunlock(...)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:404
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.bgsweep(0x0?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgcsweep.go:321 +0xdf fp=0xc0000857c8 sp=0xc000085778 pc=0x42ca7f
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gcenable.func1()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgc.go:200 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x421965
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goexit()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/asm_amd64.s:1650 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x479041
Apr 17 15:59:50 worker3-k8s.lan crio[714]: created by runtime.gcenable in goroutine 1
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgc.go:200 +0x66
Apr 17 15:59:50 worker3-k8s.lan crio[714]: goroutine 4 [GC scavenge wait]:
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gopark(0x34e0d60?, 0x2a02360e?, 0x0?, 0x0?, 0x0?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:398 +0xce fp=0xc000085f70 sp=0xc000085f50 pc=0x44456e
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goparkunlock(...)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:404
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.(*scavengerState).park(0x34e0d60)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa0 sp=0xc000085f70 pc=0x42a0a9
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.bgscavenge(0x0?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x42a679
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gcenable.func2()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgc.go:201 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x421905
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goexit()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/asm_amd64.s:1650 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x479041
Apr 17 15:59:50 worker3-k8s.lan crio[714]: created by runtime.gcenable in goroutine 1
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mgc.go:201 +0xa5
Apr 17 15:59:50 worker3-k8s.lan crio[714]: goroutine 18 [finalizer wait]:
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.gopark(0x0?, 0x2206510?, 0x0?, 0x20?, 0x2000000020?)
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/proc.go:398 +0xce fp=0xc000084628 sp=0xc000084608 pc=0x44456e
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.runfinq()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/mfinal.go:193 +0x107 fp=0xc0000847e0 sp=0xc000084628 pc=0x420967
Apr 17 15:59:50 worker3-k8s.lan crio[714]: runtime.goexit()
Apr 17 15:59:50 worker3-k8s.lan crio[714]:         runtime/asm_amd64.s:1650 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x479041

CRI-O and Kubernetes version

$ crio --version
crio version 1.29.2
Version:        1.29.2
GitCommit:      d317b5dc918bbfbc78481072a0d93e572aa8d0e8
GitCommitDate:  2024-02-22T19:23:38Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.21.1
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version --output=json
{
  "clientVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.2",
    "gitCommit": "4b8e819355d791d96b7e9d9efe4cbafae2311c88",
    "gitTreeState": "clean",
    "buildDate": "2024-02-14T10:40:49Z",
    "goVersion": "go1.21.7",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "kustomizeVersion": "v5.0.4-0.20230601165947-6ce0bf390ce3",
  "serverVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.3",
    "gitCommit": "6813625b7cd706db5bc7388921be03071e1a492d",
    "gitTreeState": "clean",
    "buildDate": "2024-03-14T23:58:36Z",
    "goVersion": "go1.21.8",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ uname -a
Linux worker3-k8s.lan 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

Running in QEMU/KVM Virtual Hosts.
@Jeansen Jeansen added the kind/bug Categorizes issue or PR as related to a bug. label Apr 17, 2024
@kwilczynski
Copy link
Member

/assign kwilczynski

@kwilczynski
Copy link
Member

@Jeansen, thank you for the report! I am sorry that things aren't working up to scratch.

Question: Do you also see the same issue when you run CRI-O manually and try to shut it down? Also, does it happen every time you stop CRI-O, or is this intermittent in nature? Any change in the configuration file worthy of note?

@Jeansen
Copy link
Author

Jeansen commented Apr 17, 2024

@kwilczynski Thanks for the quick reply. It is not a showstopper. I can start the service just fine and everything works. But anyway, this poke into my eye.

Yes, I can reproduce this every time I stop the service.

@Jeansen
Copy link
Author

Jeansen commented Apr 17, 2024

Here are my active config settings:

[crio.runtime]
enable_pod_events = true
default_runtime = "crun"
[crio.runtime.runtimes.crun]
runtime_path = "/usr/bin/crio-crun"
runtime_type = ""
runtime_root = ""
runtime_config_path = ""
monitor_path = "/usr/bin/crio-conmon"
monitor_cgroup = "system.slice"
monitor_exec_cgroup = ""
allowed_annotations = [
        "io.containers.trace-syscall",
]
privileged_without_host_devices = false
[crio.runtime.runtimes.runc]
runtime_path = ""
runtime_type = "oci"
runtime_root = "/run/runc"
runtime_config_path = ""
monitor_path = ""
monitor_cgroup = "system.slice"
monitor_exec_cgroup = ""
privileged_without_host_devices = false
[crio.image]
signature_policy = "/etc/crio/policy.json"
[crio.metrics]
enable_metrics = true

That is, everything that is set explicitly, either in the default config or from the drop-in folder.

@roman-kiselenko
Copy link
Member

Looks like this happens here:

cri-o/server/server.go

Lines 361 to 364 in d317b5d

if s.config.EnablePodEvents {
// closing a non-nil channel only if the evented pleg is enabled
close(s.ContainerEventsChan)
}

I find only one place where it can happen:

cri-o/server/server.go

Lines 789 to 791 in d317b5d

if s.config.EnablePodEvents {
close(s.ContainerEventsChan)
}

Logs might help, especially the Watch error entry. 🤔

@bitoku
Copy link
Contributor

bitoku commented May 15, 2024

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants