Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kill command fails with read cgroup.procs: operation not supported #3821

Closed
amurzeau opened this issue Apr 8, 2023 · 7 comments
Closed

kill command fails with read cgroup.procs: operation not supported #3821

amurzeau opened this issue Apr 8, 2023 · 7 comments

Comments

@amurzeau
Copy link

amurzeau commented Apr 8, 2023

Description

Hi,

While testing buildkit within a docker container, tests use runc.
When tring to kill a runc container, runc error out with and error like this:
read /sys/fs/cgroup/buildkit/mxv4shz9kwdm0p5u49mw971ft/cgroup.procs: operation not supported
and then return error code 1.
The command line is this one:
runc --root /run/containerd/runc/buildkit --log /tmp/bktest_containerd1141985211/state/io.containerd.runtime.v2.task/buildkit/mxv4shz9kwdm0p5u49mw971ft/log.json --log-format json kill --all mxv4shz9kwdm0p5u49mw971ft 9

Steps to reproduce the issue

  1. Run buildkit tests on a Debian Unstable with docker rootful from docker.io package running the dev-env target from the Dockerfile at the root of buildkit git repository.

Describe the results you received and expected

Several tests using containerd fail with this error:

time="2023-04-08T18:18:07Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = process \"sh -c cat /dev/urandom | head -c 100 | sha256sum > /randomfile\" did not complete successfully: failed to delete task rkpae5w5k41u61w4vqnuoh6gy: unknown error after kill: runc did not terminate successfully: exit status 1: read /sys/fs/cgroup/buildkit/rkpae5w5k41u61w4vqnuoh6gy/cgroup.procs: operation not supported\n: unknown"
process "sh -c cat /dev/urandom | head -c 100 | sha256sum > /randomfile" did not complete successfully: failed to delete task rkpae5w5k41u61w4vqnuoh6gy: unknown error after kill: runc did not terminate successfully: exit status 1: read /sys/fs/cgroup/buildkit/rkpae5w5k41u61w4vqnuoh6gy/cgroup.procs: operation not supported
: unknown

What version of runc are you using?

runc version v1.1.5
spec: 1.0.2-dev
go: go1.20.3
libseccomp: 2.5.4

Host OS information

Host:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

container running dev-env target from Dockerfile from buildkit git repository:

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.17.3
PRETTY_NAME="Alpine Linux v3.17"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

Host kernel information

Linux DOC-PC3 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) x86_64 GNU/Linux

@amurzeau
Copy link
Author

amurzeau commented Apr 8, 2023

I found that the issue is that the cgroup is in threaded mode, and in that case, reading cgroup.procs returns ENOTSUP.

By patching runc with the following patch, tests work again and runc doesn't fail:

diff --git a/libcontainer/cgroups/utils.go b/libcontainer/cgroups/utils.go
index b32af4ee..70080efd 100644
--- a/libcontainer/cgroups/utils.go
+++ b/libcontainer/cgroups/utils.go
@@ -19,6 +19,7 @@ import (
 
 const (
        CgroupProcesses   = "cgroup.procs"
+       CgroupThreads     = "cgroup.threads"
        unifiedMountpoint = "/sys/fs/cgroup"
        hybridMountpoint  = "/sys/fs/cgroup/unified"
 )
@@ -137,14 +138,16 @@ func GetAllSubsystems() ([]string, error) {
 }
 
 func readProcsFile(dir string) ([]int, error) {
-       f, err := OpenFile(dir, CgroupProcesses, os.O_RDONLY)
+       contents, err := ReadFile(dir, CgroupProcesses)
+       if errors.Is(err, unix.ENOTSUP) {
+               contents, err = ReadFile(dir, CgroupThreads)
+       }
        if err != nil {
                return nil, err
        }
-       defer f.Close()
 
        var (
-               s   = bufio.NewScanner(f)
+               s   = bufio.NewScanner(strings.NewReader(contents))
                out = []int{}
        )

Here is the type of the cgroups (these commands were run inside the buildkit's dev-env container:

# cat /sys/fs/cgroup/buildkit/mxv4shz9kwdm0p5u49mw971ft/cgroup.type
threaded
# cat /sys/fs/cgroup/buildkit/cgroup.type
threaded
# cat /sys/fs/cgroup/cgroup.type
domain threaded

@amurzeau amurzeau changed the title read cgroup.procs: operation not supported kill command fails with read cgroup.procs: operation not supported Apr 8, 2023
@Bacto
Copy link

Bacto commented Jan 9, 2024

Hi,
I have the same issue (with runc 1.1.10).
Having this patch applied to the next version would be awesome!

@kolyshkin
Copy link
Contributor

@Bacto we've changed this part of runc a lot in the main branch. Can you try to repro this using runc compiled from the main branch?

@Bacto
Copy link

Bacto commented Jan 17, 2024

Hi @kolyshkin,

I tried with the main branch and got the same issue:

# runc -v
runc version 1.1.0+dev
commit: 0c5a735
spec: 1.1.0+dev
go: go1.21.6
libseccomp: 2.5.5

@kolyshkin
Copy link
Contributor

Here is the type of the cgroups (these commands were run inside the buildkit's dev-env container:

# cat /sys/fs/cgroup/buildkit/mxv4shz9kwdm0p5u49mw971ft/cgroup.type
threaded
# cat /sys/fs/cgroup/buildkit/cgroup.type
threaded
# cat /sys/fs/cgroup/cgroup.type
domain threaded

So the problem here is threaded cgroup type. In this case, processes actually belong to the cgroup parent which has "domain threaded" type (i.e. top cgroup in this case). It would be incorrect to send SIGKILL to specific threads in this group. So, basically, runc kill does the right thing here returning an error.

This is some kind of a misconfiguration, possibly caused by buildkit.

@kolyshkin
Copy link
Contributor

Created Debian 12 VM, checked in buildkit and ran its test suite inside a container (make test). Was not able to reproduce.

I think there was something wrong originally when starting a container.

Would still like to get to the bottom of it, so any suggestions of how to reproduce it (ideally a vagrant file or something like this) are welcome.

@amurzeau
Copy link
Author

amurzeau commented Jun 3, 2024

The issue is fixed in main branch.
I've tried again the 1.1.5 version and reproduced it, but I don't reproduce it with the main branch of runc.

I've tried to find the first fixed version and found that I can reproduce the same issue with 1.1.12 but not anymore with 1.2.0-rc.1.

So I'm closing this issue.

For reference, I'm using go test -v -run ^TestIntegration/TestDiffSingleLayer.*$ github.com/moby/buildkit/client -count=1 to run affected tests in buildkit with the tested runc in /usr/bin/runc.

@amurzeau amurzeau closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants