Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runners: revert "runners: clean up temp files before exiting the runner" #1788

Merged
merged 1 commit into from
May 20, 2024

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented May 15, 2024

This reverts commit bc04bfc.

The remove_tmpfiles() helper is nice but it is also problematic because it creates extra output after the command was run and created output. E.g. a test failure on centos stream9 [0]

            r = root.run(["stat", "--format=%a", "/var/tmp"], monitor)
            assert r.returncode == 0
>           assert r.stdout.strip().split("\n")[-1] == "1777"
E           AssertionError: assert '/usr/lib/tmp... such process' == '1777'
E
E             - 1777
E             + /usr/lib/tmpfiles.d/rpcbind.conf:2: Failed to resolve user 'rpc': No such process

Here the output from "stat" is not the last output because the rempve_tmpfiles runs systemd-tmpfiles --clean --remove which produces some noisy output after stat was run.

This was found by @thozza (thanks!) and discussed in osbuild PR#1785.

There are various ways to fix this, the one is to use the --graceful option of systemd-tmpfiles. However that only got added in systemd v256 and centos-stream9 has v252 so that is sadly not an option.

Plus even when avaialble it will produce some informational output like

All rules containing unresolvable specifiers will be skipped.

Another way would be to sent the output from systemd-tmpfiles cleanup to /dev/null. Not really great as we will not know about real problems or warnings that we should care about.

None of the option above is good. So I started looking at the tmpfiles.d rules and the cleanup and why we are doing it. It was added relatively recently in #1458 and after some medidiation not having it seems to do no harm (details below). The tl;dr is that the buildroot is created inside bubblewrap and the dirs that --clean and --remove touch are already tmpdirs created just for the buildroot so the cleanup in the runner is redundant (and because the cleanup is now run for each buidlroot.run() command there might be unintended conequences but the current rules seem to not have any).

In detail, the tmpfiles_cleanup() does two things:

  1. --clean It will remove files that are older then the given age in tmpfiles.d. The tmpfiles in centos9 give me the following ages:
$ systemd-tmpfiles --cat-config|grep -E '[0-9]+d$'
d /var/lib/systemd/pstore 0755 root root 14d
d /var/lib/systemd/coredump 0755 root root 3d
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d
D! /tmp/.X11-unix 1777 root root 10d
D! /tmp/.ICE-unix 1777 root root 10d
D! /tmp/.XIM-unix 1777 root root 10d
D! /tmp/.font-unix 1777 root root 10d

Given that we run our commands inside a bubblewrap environment and give it a fresh /run, /tmp, /var [1] there really should be no long lived things and even if there are they are cleaned up from the buildroot itself

  1. --remove It will remove files marked for removal in tmpdfiles.d. Running it on a centos9 env it yields for me:
$ systemd-tmpfiles --cat-config|grep -E '^[rRD]'
R /var/tmp/dnf*/locks/*
r /var/cache/dnf/download_lock.pid
r /var/cache/dnf/metadata_lock.pid
r /var/lib/dnf/rpmdb_lock.pid
r /var/log/log_lock.pid
r! /forcefsck
r! /fastboot
r! /forcequotacheck
D! /var/lib/containers/storage/tmp 0700 root root
D! /run/podman 0700 root root
D! /var/lib/cni/networks
R! /var/tmp/container_images*
D     /run/rpcbind 0700  rpc  rpc  -  -
D /run/sudo/ts 0700 root root
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
r! /var/lib/systemd/coredump/.#*
D! /tmp/.X11-unix 1777 root root 10d
D! /tmp/.ICE-unix 1777 root root 10d
D! /tmp/.XIM-unix 1777 root root 10d
D! /tmp/.font-unix 1777 root root 10d
r! /tmp/.X[0-9]*-lock

which is also covered by the bwrap cleanup.

[0] https://artifacts.dev.testing-farm.io/2d07b8f3-5f52-4e61-b1fa-5328a0ff1058/#artifacts-/plans/unit-tests
[1] https://github.com/osbuild/osbuild/blob/main/osbuild/buildroot.py#L218

Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@achilleas-k achilleas-k enabled auto-merge (rebase) May 16, 2024 14:34
Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for doing such thorough research and providing detailed explanation ❤️

This reverts commit bc04bfc.

The `remove_tmpfiles()` helper is nice but it is also problematic
because it creates extra output after the command was run and
created output. E.g. a test failure on centos stream9 [0]
```
            r = root.run(["stat", "--format=%a", "/var/tmp"], monitor)
            assert r.returncode == 0
>           assert r.stdout.strip().split("\n")[-1] == "1777"
E           AssertionError: assert '/usr/lib/tmp... such process' == '1777'
E
E             - 1777
E             + /usr/lib/tmpfiles.d/rpcbind.conf:2: Failed to resolve user 'rpc': No such process
```
Here the output from "stat" is not the last output because the
rempve_tmpfiles runs `systemd-tmpfiles --clean --remove` which
produces some noisy output after stat was run.

This was found by @thozza (thanks!) and discussed in osbuild PR#1785.

There are various ways to fix this, the one is to use the
`--graceful` option of systemd-tmpfiles. However that only got added in
systemd v256 and centos-stream9 has v252 so that is sadly not an option.

Plus even when avaialble it will produce some informational output like
```
All rules containing unresolvable specifiers will be skipped.
```

Another way would be to sent the output from systemd-tmpfiles cleanup
to /dev/null. Not really great as we will not know about real problems
or warnings that we should care about.

None of the option above is good. So I started looking at the tmpfiles.d
rules and the cleanup and why we are doing it. It was added relatively
recently in osbuild#1458 and after
some medidiation not having it seems to do no harm (details below). The
tl;dr is that the buildroot is created inside bubblewrap and the
dirs that `--clean` and `--remove` touch are already tmpdirs created
just for the buildroot so the cleanup in the runner is redundant
(and because the cleanup is now run for each buidlroot.run() command
there *might* be unintended conequences but the current rules seem
to not have any).

In detail, the tmpfiles_cleanup() does two things:
1. `--clean`
It will remove files that are older then the given age
in tmpfiles.d. The tmpfiles in centos9 give me the following ages:
```
$ systemd-tmpfiles --cat-config|grep -E '[0-9]+d$'
d /var/lib/systemd/pstore 0755 root root 14d
d /var/lib/systemd/coredump 0755 root root 3d
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d
D! /tmp/.X11-unix 1777 root root 10d
D! /tmp/.ICE-unix 1777 root root 10d
D! /tmp/.XIM-unix 1777 root root 10d
D! /tmp/.font-unix 1777 root root 10d
```
Given that we run our commands inside a bubblewrap environment and
give it a fresh /run, /tmp, /var [1] there really should be no long
lived things and even if there are they are cleaned up from the
buildroot itself

2. `--remove`
It will remove files marked for removal in tmpdfiles.d. Running
it on a centos9 env it yields for me:
```
$ systemd-tmpfiles --cat-config|grep -E '^[rRD]'
R /var/tmp/dnf*/locks/*
r /var/cache/dnf/download_lock.pid
r /var/cache/dnf/metadata_lock.pid
r /var/lib/dnf/rpmdb_lock.pid
r /var/log/log_lock.pid
r! /forcefsck
r! /fastboot
r! /forcequotacheck
D! /var/lib/containers/storage/tmp 0700 root root
D! /run/podman 0700 root root
D! /var/lib/cni/networks
R! /var/tmp/container_images*
D     /run/rpcbind 0700  rpc  rpc  -  -
D /run/sudo/ts 0700 root root
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
r! /var/lib/systemd/coredump/.#*
D! /tmp/.X11-unix 1777 root root 10d
D! /tmp/.ICE-unix 1777 root root 10d
D! /tmp/.XIM-unix 1777 root root 10d
D! /tmp/.font-unix 1777 root root 10d
r! /tmp/.X[0-9]*-lock
```
which is also covered by the bwrap cleanup.

[0] https://artifacts.dev.testing-farm.io/2d07b8f3-5f52-4e61-b1fa-5328a0ff1058/#artifacts-/plans/unit-tests
[1] https://github.com/osbuild/osbuild/blob/main/osbuild/buildroot.py#L218
@thozza
Copy link
Member

thozza commented May 17, 2024

I clicked the rebase button, since the Gitlab CI was confused. The PR branch was never pushed to Gitlab and thus it didn't run. I didn't find any obvious way to trigger the Trigger GitLab CI job manually.

@mvo5
Copy link
Contributor Author

mvo5 commented May 17, 2024

I clicked the rebase button, since the Gitlab CI was confused. The PR branch was never pushed to Gitlab and thus it didn't run. I didn't find any obvious way to trigger the Trigger GitLab CI job manually.

Thanks! It seems to have not helped :( I will close/reopen and see if that fixes the CI problem.

@mvo5 mvo5 closed this May 17, 2024
auto-merge was automatically disabled May 17, 2024 11:09

Pull request was closed

@mvo5 mvo5 reopened this May 17, 2024
@achilleas-k
Copy link
Member

It might be related to the snapshot URL check failing with a timeout. I restarted it. Let's see if it helps.

@achilleas-k
Copy link
Member

Schutzbot's alive!! 🎉

@bcl bcl merged commit 2a17756 into osbuild:main May 20, 2024
59 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants