-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runners: revert "runners: clean up temp files before exiting the runner" #1788
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks for doing such thorough research and providing detailed explanation ❤️
This reverts commit bc04bfc. The `remove_tmpfiles()` helper is nice but it is also problematic because it creates extra output after the command was run and created output. E.g. a test failure on centos stream9 [0] ``` r = root.run(["stat", "--format=%a", "/var/tmp"], monitor) assert r.returncode == 0 > assert r.stdout.strip().split("\n")[-1] == "1777" E AssertionError: assert '/usr/lib/tmp... such process' == '1777' E E - 1777 E + /usr/lib/tmpfiles.d/rpcbind.conf:2: Failed to resolve user 'rpc': No such process ``` Here the output from "stat" is not the last output because the rempve_tmpfiles runs `systemd-tmpfiles --clean --remove` which produces some noisy output after stat was run. This was found by @thozza (thanks!) and discussed in osbuild PR#1785. There are various ways to fix this, the one is to use the `--graceful` option of systemd-tmpfiles. However that only got added in systemd v256 and centos-stream9 has v252 so that is sadly not an option. Plus even when avaialble it will produce some informational output like ``` All rules containing unresolvable specifiers will be skipped. ``` Another way would be to sent the output from systemd-tmpfiles cleanup to /dev/null. Not really great as we will not know about real problems or warnings that we should care about. None of the option above is good. So I started looking at the tmpfiles.d rules and the cleanup and why we are doing it. It was added relatively recently in osbuild#1458 and after some medidiation not having it seems to do no harm (details below). The tl;dr is that the buildroot is created inside bubblewrap and the dirs that `--clean` and `--remove` touch are already tmpdirs created just for the buildroot so the cleanup in the runner is redundant (and because the cleanup is now run for each buidlroot.run() command there *might* be unintended conequences but the current rules seem to not have any). In detail, the tmpfiles_cleanup() does two things: 1. `--clean` It will remove files that are older then the given age in tmpfiles.d. The tmpfiles in centos9 give me the following ages: ``` $ systemd-tmpfiles --cat-config|grep -E '[0-9]+d$' d /var/lib/systemd/pstore 0755 root root 14d d /var/lib/systemd/coredump 0755 root root 3d q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d D! /tmp/.X11-unix 1777 root root 10d D! /tmp/.ICE-unix 1777 root root 10d D! /tmp/.XIM-unix 1777 root root 10d D! /tmp/.font-unix 1777 root root 10d ``` Given that we run our commands inside a bubblewrap environment and give it a fresh /run, /tmp, /var [1] there really should be no long lived things and even if there are they are cleaned up from the buildroot itself 2. `--remove` It will remove files marked for removal in tmpdfiles.d. Running it on a centos9 env it yields for me: ``` $ systemd-tmpfiles --cat-config|grep -E '^[rRD]' R /var/tmp/dnf*/locks/* r /var/cache/dnf/download_lock.pid r /var/cache/dnf/metadata_lock.pid r /var/lib/dnf/rpmdb_lock.pid r /var/log/log_lock.pid r! /forcefsck r! /fastboot r! /forcequotacheck D! /var/lib/containers/storage/tmp 0700 root root D! /run/podman 0700 root root D! /var/lib/cni/networks R! /var/tmp/container_images* D /run/rpcbind 0700 rpc rpc - - D /run/sudo/ts 0700 root root R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* r! /var/lib/systemd/coredump/.#* D! /tmp/.X11-unix 1777 root root 10d D! /tmp/.ICE-unix 1777 root root 10d D! /tmp/.XIM-unix 1777 root root 10d D! /tmp/.font-unix 1777 root root 10d r! /tmp/.X[0-9]*-lock ``` which is also covered by the bwrap cleanup. [0] https://artifacts.dev.testing-farm.io/2d07b8f3-5f52-4e61-b1fa-5328a0ff1058/#artifacts-/plans/unit-tests [1] https://github.com/osbuild/osbuild/blob/main/osbuild/buildroot.py#L218
829a330
to
858dedb
Compare
I clicked the |
Thanks! It seems to have not helped :( I will close/reopen and see if that fixes the CI problem. |
It might be related to the snapshot URL check failing with a timeout. I restarted it. Let's see if it helps. |
Schutzbot's alive!! 🎉 |
This reverts commit bc04bfc.
The
remove_tmpfiles()
helper is nice but it is also problematic because it creates extra output after the command was run and created output. E.g. a test failure on centos stream9 [0]Here the output from "stat" is not the last output because the rempve_tmpfiles runs
systemd-tmpfiles --clean --remove
which produces some noisy output after stat was run.This was found by @thozza (thanks!) and discussed in osbuild PR#1785.
There are various ways to fix this, the one is to use the
--graceful
option of systemd-tmpfiles. However that only got added in systemd v256 and centos-stream9 has v252 so that is sadly not an option.Plus even when avaialble it will produce some informational output like
Another way would be to sent the output from systemd-tmpfiles cleanup to /dev/null. Not really great as we will not know about real problems or warnings that we should care about.
None of the option above is good. So I started looking at the tmpfiles.d rules and the cleanup and why we are doing it. It was added relatively recently in #1458 and after some medidiation not having it seems to do no harm (details below). The tl;dr is that the buildroot is created inside bubblewrap and the dirs that
--clean
and--remove
touch are already tmpdirs created just for the buildroot so the cleanup in the runner is redundant (and because the cleanup is now run for each buidlroot.run() command there might be unintended conequences but the current rules seem to not have any).In detail, the tmpfiles_cleanup() does two things:
--clean
It will remove files that are older then the given age in tmpfiles.d. The tmpfiles in centos9 give me the following ages:Given that we run our commands inside a bubblewrap environment and give it a fresh /run, /tmp, /var [1] there really should be no long lived things and even if there are they are cleaned up from the buildroot itself
--remove
It will remove files marked for removal in tmpdfiles.d. Running it on a centos9 env it yields for me:which is also covered by the bwrap cleanup.
[0] https://artifacts.dev.testing-farm.io/2d07b8f3-5f52-4e61-b1fa-5328a0ff1058/#artifacts-/plans/unit-tests
[1] https://github.com/osbuild/osbuild/blob/main/osbuild/buildroot.py#L218