Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: Add support for modifying sysroots without a daemon #3006

Closed
wants to merge 3 commits into from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 20, 2021

Hack day project. This is really rough but functional. Came out of discussions in #2854. The way this works is we add a new "mux" layer which handles either calling through D-Bus or running the transaction code directly. On the "transaction" side, we factor out callimpl functions which just take in simpler parameters to untie it a bit from the daemon stuff. It's actually not that much new code. It's more a lot of reshuffling code we already have.

Some demos. For example, adding config files needed for the rootfs at installation time:

[root@cosa-devsh ~]# coreos-installer install /dev/vda
[root@cosa-devsh ~]# mount /dev/disk/by-label/root /mnt
[root@cosa-devsh ~]# mount /dev/disk/by-label/boot /mnt/boot
[root@cosa-devsh ~]# systemctl mask --now rpm-ostreed # not needed; just to verify that we're not using the daemon in the following commands
[root@cosa-devsh ~]# rpm-ostree ex initramfs-etc --track /etc/multipath.conf --sysroot /mnt
Checking out tree 39936e0... done
Writing OSTree commit... done
[root@cosa-devsh ~]# rpm-ostree status --sysroot /mnt
Deployments:
  fedora:fedora/x86_64/coreos/testing-devel
                   Version: 34.20210716.dev.0 (2021-07-16T15:30:30Z)
                BaseCommit: 39936e04bf12f50017300d01abee8073eef8923f7c6a277280df1366a3599259
              GPGSignature: (unsigned)
              InitramfsEtc: /etc/multipath.conf

  fedora:fedora/x86_64/coreos/testing-devel
                   Version: 34.20210716.dev.0 (2021-07-16T15:30:30Z)
                    Commit: 39936e04bf12f50017300d01abee8073eef8923f7c6a277280df1366a3599259
              GPGSignature: (unsigned)
[root@cosa-devsh ~]# reboot

Another example is running rpm-ostree install from the initramfs and rebooting so that the packages are installed from first boot:

sh-5.1# mount /dev/disk/by-label/boot /sysroot/boot
sh-5.1# # should use bwrap for this
sh-5.1# for mnt in dev proc sys; do mount --bind /$mnt /sysroot/$mnt; done
sh-5.1# chroot /sysroot
sh-5.1# unset INVOCATION_ID
sh-5.1# load_policy -i
sh-5.1# echo 'nameserver 10.0.2.3' > /etc/resolv.conf
sh-5.1# rpm-ostree install strace --sysroot /
Checking out tree 0f28eb2... done
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora updates-archive
Updating metadata for 'fedora-cisco-openh264'... done
Updating metadata for 'updates'... done
Updating metadata for 'fedora'... done
Updating metadata for 'updates-archive'... done
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264'; generated: 2021-02-23T00:49:00Z solvables: 4
rpm-md repo 'updates'; generated: 2021-07-20T00:55:02Z solvables: 15683
rpm-md repo 'fedora'; generated: 2021-04-23T10:47:57Z solvables: 63586
rpm-md repo 'updates-archive'; generated: 2021-07-20T01:26:43Z solvables: 20515
Resolving dependencies... done
Will download: 1 package (1.2 MB)
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... 0 done
Running post scripts... done
Running posttrans scripts... 4 done
Writing rpmdb... done
Writing OSTree commit... done
note: Deploying commit ef1f5c9394d9296c7618af33f4b5eb2843b6a38ef2f1fd882a20e655daed2796 which contains content in /var/lib that will be ignored.
Changes queued for next boot. Run "systemctl reboot" to start a reboot
sh-5.1# rpm-ostree status --sysroot /
Deployments:
  0f28eb244bafd3b249fc38c66b2679e36d20dd570972baf811da95f0afdb32f4
                   Version: v2021.7-3-g7e93eada85 (2021-07-20T16:08:06Z)
           LayeredPackages: strace

  0f28eb244bafd3b249fc38c66b2679e36d20dd570972baf811da95f0afdb32f4
                   Version: v2021.7-3-g7e93eada85 (2021-07-20T16:08:06Z)
sh-5.1#
exit
sh-5.1# reboot
...
[root@cosa-devsh ~]# rpm-ostree status
State: idle
Deployments:
● 0f28eb244bafd3b249fc38c66b2679e36d20dd570972baf811da95f0afdb32f4
                   Version: v2021.7-3-g7e93eada85 (2021-07-20T16:08:06Z)
           LayeredPackages: strace

  0f28eb244bafd3b249fc38c66b2679e36d20dd570972baf811da95f0afdb32f4
                   Version: v2021.7-3-g7e93eada85 (2021-07-20T16:08:06Z)

(Obviously, apply-live would be nicer to avoid the reboot, though need to unwind more things there first.)

There's a bunch of details I'm glossing over here and maybe that initramfs example isn't feasible, though I found it interesting as a POC to show the possibilities of supporting sysroots like this.

@openshift-ci
Copy link

openshift-ci bot commented Jul 20, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jlebon
Copy link
Member Author

jlebon commented Jul 20, 2021

It's actually not that much new code. It's more a lot of reshuffling code we already have.

On that topic, although there isn't any C code being converted to Rust here, I think this would help the cause because those callimpl functions then become easier targets for oxidation (by virtue of being purer)

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here is that I'd like to make increasing use of systemd in the daemon. We already added isolation.rs which goes in that direction, and if we go this way we will need to make a hard decision: do we enhance that code (like the livefs bits) to understand non-default sysroots, or do we skip using systemd entirely? I think for use in the initramfs, we can clearly still make use of systemd. But some of this work is also related to eventually supporting running in a podman/docker container, where we won't have systemd.

@@ -418,12 +418,12 @@ rpmostree_run_script_in_bwrap_container (int rootfs_fd,
{
stdout_fd = sd_journal_stream_fd (id, LOG_INFO, 0);
if (stdout_fd < 0)
return glnx_prefix_error (error, "While creating stdout stream fd");
return glnx_throw (error, "While creating stdout stream fd: %s", g_strerror (-stdout_fd));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be split out now.

Comment on lines -521 to -522
if (progress)
ostree_async_progress_finish (progress);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Shouldn't we include the _finish() call elsewhere though? Is this related?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We _finish() at a higher level. We need to be consistent about calling _finish() at the same scope level as ostree_async_progress_new() so it's clear they all match up. Have to split out that patch.

@cgwalters
Copy link
Member

cgwalters commented Jul 20, 2021

Tangentially related to this...I've mentioned this elsewhere before but basically I've been thinking that the direction we should go in is changing the daemon to fork off the transaction as a subprocess. It'd start us down the road to solving a whole pile of issues, of which the biggest is making cancellation reliably work - we can just kill the subprocess while keeping the DBus proxy up.

If we go that route, it would make implementing this easier as we could fork off in the same way outside of DBus.

@jlebon
Copy link
Member Author

jlebon commented Jul 20, 2021

One problem here is that I'd like to make increasing use of systemd in the daemon. We already added isolation.rs which goes in that direction, and if we go this way we will need to make a hard decision: do we enhance that code (like the livefs bits) to understand non-default sysroots, or do we skip using systemd entirely? I think for use in the initramfs, we can clearly still make use of systemd. But some of this work is also related to eventually supporting running in a podman/docker container, where we won't have systemd.

Yeah, that's actually what I was hitting when trying --apply-live in the initramfs demo. This mostly worked:

diff --cc rust/src/isolation.rs
index 18c3916b,18c3916b..cc81d155
--- a/rust/src/isolation.rs
+++ b/rust/src/isolation.rs
@@@ -33,7 -33,7 +33,14 @@@ pub(crate) struct UnitConfig<'a>
  #[context("Running systemd worker")]
  pub(crate) fn run_systemd_worker_sync(cfg: &UnitConfig) -> Result<()> {
      if !crate::utils::running_in_systemd() {
--        return Err(anyhow!("Not running under systemd"));
++        // XXX
++        let mut cmd = Command::new(cfg.exec_args[0]);
++        cmd.args(&cfg.exec_args[1..]);
++        let status = cmd.status()?;
++        if !status.success() {
++            return Err(anyhow!("{}", status));
++        }
++        return Ok(())
      }
      let mut cmd = Command::new("systemd-run");
      cmd.args(BASE_ARGS);

But really, we don't actually need the systemd-tmpfiles bit since it'll run anyway when we switchroot, only the ostree admin unlock --transient one (but actually, one random thought I've had is to not pull in overlayfs at all since the sysroot isn't in use yet -- so we could theoretically change the sysroot to the new deployment root, but then we'd need to teach this to libostree to avoid having to bind-mount over /proc/cmdline to fool it).

Overall, the approach I was aiming for was basically rpm-ostree would have two modes:

  1. booted mode: free use of systemd, journal, D-Bus, etc...
  2. non-booted mode: no assumption of systemd, journal, D-Bus, etc... i.e. no host service assumptions -- this would keep it as flexible as possible for various use cases. The mux code for example already keys off of sd_booted() to detect this.

So yeah, this indeed would require adapting various code bits to handle both cases, which adds complexity.

Tangentially related to this...I've mentioned this elsewhere before but basically I've been thinking that the direction we should go in is changing the daemon to fork off the transaction as a subprocess. It'd start us down the road to solving a whole pile of issues, of which the biggest is making cancellation reliably work - we can just kill the subprocess while keeping the DBus proxy up.

If we go that route, it would make implementing this easier as we could fork off in the same way outside of DBus.

Yeah, that's an interesting idea. Related to this, I did briefly look at using the transaction API even in the "direct" mode, but didn't want to deal with dropping the GDBusMethodInvocation assumptions there. It didn't feel like it was worth the added complexity vs just being able to call the implementing function directly. And with the rpmostree_output abstraction we already have in place, progress output doesn't really require many tweaks at all to work in that mode.

@jlebon
Copy link
Member Author

jlebon commented Jul 22, 2021

So how do we feel about this approach? Worth cleaning up and pursuing? We'll incur some churn in splitting out the callimpl functions, but it's mostly a mechanical thing and I think there's a lot of potential there (and again, I think this will then make oxidizing the callimpl functions much easier).

@cgwalters
Copy link
Member

What if we reworked things such that the single DBus invocation was UpdateDeployment? Then we'd have a single mux call.

@openshift-ci
Copy link

openshift-ci bot commented Jul 28, 2021

@jlebon: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlebon
Copy link
Member Author

jlebon commented Jul 29, 2021

What if we reworked things such that the single DBus invocation was UpdateDeployment? Then we'd have a single mux call.

Hmm, so you mean e.g. fold InitramfsEtc into UpdateDeployment? Yeah, I think that'd work to start. It's definitely the trickiest one to mux. The other smaller ones are much easier so I don't think it'd be adding much complexity, though I understand the appeal of keeping it minimal. We can always expand it in the future if needed. (And of course, there's no need to mux the "legacy" APIs like PkgChange -- this PR already handles this by preferring UpdateDeployment in the non-D-Bus case when using e.g. rpm-ostree install).

jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Dec 15, 2021
A papercut right now is that when using Tang, one must use kernel
arguments to specify network configuration because network is required
on every boot.

This breaks the UX we're trying to emphasize, which is towards
specifying networking settings using NM keyfiles *once* and having it
apply everywhere.

Let's fix this by automatically detecting this case and enabling
`initramfs-etc` tracking of NM keyfiles on first boot. This will allow
users to stay on the NM keyfile path and have it transparently work even
when using Tang.

The new service does this from the real root. Long-term, I'd like to
investigate doing it earlier in the initramfs (this is related to
coreos/rpm-ostree#3006).

Modifications to NM keyfiles in the real root are automatically synced
on every new deployment, or explicitly when using
`rpm-ostree ex initramfs-etc --force-sync`.
@openshift-ci
Copy link

openshift-ci bot commented Jun 6, 2023

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/build-clang 42ea874 link true /test build-clang
ci/prow/images 42ea874 link true /test images
ci/prow/clang-analyzer 42ea874 link true /test clang-analyzer
ci/prow/fcos-e2e 42ea874 link true /test fcos-e2e
ci/prow/kola-upgrade 42ea874 link true /test kola-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jlebon
Copy link
Member Author

jlebon commented May 29, 2024

Given #4972, I'm not sure this makes sense anymore.

@jlebon jlebon closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants