Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering OOM leaves cgroups in bad state #1092

Open
maleadt opened this issue Dec 8, 2022 · 4 comments
Open

Triggering OOM leaves cgroups in bad state #1092

maleadt opened this issue Dec 8, 2022 · 4 comments

Comments

@maleadt
Copy link
Contributor

maleadt commented Dec 8, 2022

Experimenting with memory limits:

    "linux": {
        "resources": {
            "memory": {
                "limit": 1048576
            }
        },
❯ ./crun --systemd-cgroup run test
KILLED

❯ ./crun --systemd-cgroup run test
2022-12-08T12:52:33.057298Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists

Deleting the container doesn't work:

❯ ./crun --systemd-cgroup run test
2022-12-08T12:54:46.316591Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists

Full config:

{
    "ociVersion": "1.0.1",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "root": {
        "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
        "readonly": true
    },
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "ro",
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "process": {
        "terminal": true,
        "cwd": "/root",
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "args": [
            "/bin/bash", "-l"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "inheritable": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL"
            ],
            "ambient": [
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "noNewPrivileges": true
    },
    "user": {
        "uid": 0,
        "gid": 0
    },
    "hostname": "test",
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ],
            "memory": {
                "limit": 1048576
            }
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            },
            {
                "type": "cgroup"
            }
        ],
        "uidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "devices": null
    }
}
@giuseppe
Copy link
Member

giuseppe commented Dec 8, 2022

I think that the low memory limit causes crun itself to fail and not the container payload.

@maleadt
Copy link
Contributor Author

maleadt commented Dec 8, 2022

I think that the low memory limit causes crun itself to fail and not the container payload.

Right, that's what I thought too. Is that avoidable? Or should crun deal with the remnants of an previous run when starting a new container?

@giuseppe
Copy link
Member

giuseppe commented Dec 8, 2022

weird, I am not able to reproduce locally, if I specify your limit then crun works fine. If I set it lower, then I get:

2022-12-08T21:26:31.701143Z: OOM: the memory limit could be too low: read from the init process

Could you please show the output of cat /proc/self/cgroup as well as checking what processes are in the crun-test.scope cgroup?

Any useful information in systemctl --user status crun-test.scope ?

@maleadt
Copy link
Contributor Author

maleadt commented Dec 9, 2022

I had to lower the memory limit for this to reproduce today:

❯ ./crun --systemd-cgroup run oom_test2

❯ ./crun --systemd-cgroup run oom_test2
2022-12-09T08:36:59.464090Z: the memory limit could be too low: sd-bus call: Unit crun-oom_test2.scope was already loaded or has a fragment file.: File exists

Interestingly, the error is slightly different now, including the memory limit could be too low. The requested info:

❯ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-327.scope

❯ systemctl --user status crun-oom_test2.scope
× crun-oom_test2.scope - libcrun container
     Loaded: loaded (/run/user/1000/systemd/transient/crun-oom_test2.scope; transient)
  Transient: yes
     Active: failed (Result: oom-kill) since Fri 2022-12-09 09:36:57 CET; 28s ago
   Duration: 16ms
        CPU: 15ms

Dec 09 09:36:57 taurus systemd[964]: Started libcrun container.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: A process of this unit has been killed by the OOM killer.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: Failed with result 'oom-kill'.

Also interestingly, I can't find crun-oom_test2.scope anywhere in /sys/fs/cgroup... I can find a crun-test.scope (with no processes attached to it) from when I tried this yesterday, so it seems like there's two different error cases here (one where the container gets killed and a created cgroup lingers, and one where the container dies with the memory limit could be too low and no cgroup is created but some systemd state still lingers).


If I raise the memory limit back to 1048576, I need to do something more intensive in the container, say, sh -c "find /". That does again result in an OOM kill, but not of the container process, and as such the created cgroups seem to get cleaned up fine. I guess this is the expected scenario.


With bash -c "echo 'Hello, World!'" (i.e. not using a log-in prompt) I need to further lower the memory limit, but it does seem to reproduce consistenly here with the following config:

{
    "ociVersion": "1.0.1",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "root": {
        "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
        "readonly": true
    },
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "ro",
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "process": {
        "terminal": true,
        "cwd": "/root",
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "args": [
            "/bin/bash", "-c", "echo 'Hello, World!'"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "inheritable": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL"
            ],
            "ambient": [
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "noNewPrivileges": true
    },
    "user": {
        "uid": 0,
        "gid": 0
    },
    "hostname": "test",
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ],
            "memory": {
                "limit": 248576
            }
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            },
            {
                "type": "cgroup"
            }
        ],
        "uidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "devices": null
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants