Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora 39 (CoreOS) cri-o installation issues #7635

Open
plevart opened this issue Dec 23, 2023 · 25 comments
Open

Fedora 39 (CoreOS) cri-o installation issues #7635

plevart opened this issue Dec 23, 2023 · 25 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@plevart
Copy link

plevart commented Dec 23, 2023

What happened?

I have been successfully installing cri-o on top of Fedora CoreOS from the binary tarball(s) published on GitHub. I still do, using a technique called package-layering (https://github.com/coreos/layering-examples). I build a custom CoreOS image from base CoreOS image in the form of an OCI image that additionally contains cri-o and other software to run k8s node.
Recently, I tried to use podman on such a system. It failed with error:

$ podman run -it --rm fedora:39
Error: conmon failed: exit status 1

Researching I found that installation of cri-o 1.28.2 from tarball overwrites /usr/bin/conmon from the rpm package that is part of the basic CoreOS image: conmon-2.1.8-2.fc39.x86_64. The binary installed from cri-o tarball announces itself as:

$ /usr/bin/conmon --version
conmon version 2.1.8
commit: a3c233a4c262a9362d5ee7e87e64fd9d84efb581-dirty

...while the binary that is part of the conmon-2.1.8-2.fc39.x86_64 package says the following:

$ /usr/bin/conmon --version
conmon version 2.1.8
commit:

They mostly look like the same version, but are they? The binary from the cri-o tarball is much bigger:

-rwxr-xr-x. 2 root root 1910408 Jan  1  1970 /usr/bin/conmon

...compared to the binary from the Fedora rpm package:

-rwxr-xr-x. 3 root root 158584 Jan  1  1970 /usr/bin/conmon

But this is probably because the tarball binary is statically linked and the rpm version is dynamically.

Investigating further I found that cri-o binary tarball might have other conflicts with rpm packages installed on base CoreOS from Fedora repo. What I did was I replaced the binary tarball distribution with a rpm package published on the official k8s repo (https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.28/rpm/). This move revealed the conflict goes deeper. At first it seems that it is only the conmon package, but other Fedora packages installed on base CoreOS depend on conmon package. Transitively this evaluates to the following chain: conmon, containers-common, containers-common-extra, skopeo, podman, toolbox, ... but it doesnt stop there. If I wanted to remove those packages 1st before installing cri-o, I would have to remove a vital CoreOS package rpm-ostree, because it depends on skopeo.

So my question is: "Is it even possible to construct a consistent system without conflicts with a combination of cri-o and Fedora packages that would run on CoreOS?". I noticed that there is a cri-o package in the Fedora 39 updates repo, but it is only version 1.27.2. I need 1.28.x ... I think it would be possible if there was a cri-o packaging that would not include conmon.

What did you expect to happen?

To be able to install cri-o on CoreOS and not introduce conflicts while doing that.

How can we reproduce it (as minimally and precisely as possible)?

Use the followng Containerfile:

FROM quay.io/fedora/fedora-coreos@sha256:31560c0a6191967ff8b601684a2e82a8f9945255d81f7cdf1e6801506909fe34

ARG CRI_O_TAG=v1.28.2

COPY crio.network.conf /etc/crio/crio.conf.d/20-crio.network.conf

RUN echo -e "\nInstalling cri-o ${CRI_O_TAG}...\n" && \
        mkdir /tmp/crio_dl.$$$$ && \
        curl --fail --retry 5 --retry-delay 3 --silent --show-error \
        -o /tmp/crio_dl.$$$$/crio.tar.gz \
        https://storage.googleapis.com/cri-o/artifacts/cri-o.amd64.${CRI_O_TAG}.tar.gz && \
        cd /tmp/crio_dl.$$$$ && \
        tar -xpvzf crio.tar.gz && \
        cd cri-o && \
        sed -ie 's|/usr/local|/usr|' contrib/crio.service && \
        (grep -v cni-plugins < install | PREFIX=/usr bash) && \
    # remove localy built rpms, temporary dirs and var content afterwards
    rm -rf /local-pkgs /tmp/crio_dl.$$$$ /var/* && \
    ostree container commit

With the followng additional resource crio.network.conf:

[crio.network]
# Paths to directories where CNI plugin binaries are located.
# Besides default location below /opt (which is symlinked to /var/opt on CoreOS - this is where calico will install its plugins),
# we also configure /usr/libexec/cni (which is a place where they get installed by package containernetworking-plugins - already part of base CoreOS install)
plugin_dirs = [
      "/opt/cni/bin/",
      "/usr/libexec/cni/"
]

...to build an OCI image. Publish the image to some image registry then use it in the CoreOS to rebase to it:

rpm-ostree rebase ostree-unverified-registry:myregistry/myimage:tag

After rebooting try out the podman command etc. As you can see, the above installation of cri-o is already customized in a way that uses CNI plugins from the preinstalled CoreOS package containernetworking-plugins while skipping the installation of plugins bundled in the tarball. I can see that the tarball installation, among other files, installs also the following two:

+ install -D -m 755 -t /usr/bin bin/conmon
+ install -D -m 755 -t /usr/bin bin/conmonrs

Would you recommend that I somehow "skip" the installation of conmon so the CoreOS packagged version would remain intact and then try to use it with the cri-o and see if it works? Maybe even use conmonrs instead?

Anything else we need to know?

No response

CRI-O and Kubernetes version

$ crio -version
crio version 1.28.2
Version:        1.28.2
GitCommit:      e7be4e160f3cc3810b3f6c9fbf225697d772a9ad
GitCommitDate:  2023-11-01T17:38:03Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.20.10
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version --output=json
{
  "clientVersion": {
    "major": "1",
    "minor": "28",
    "gitVersion": "v1.28.4",
    "gitCommit": "bae2c62678db2b5053817bc97181fcc2e8388103",
    "gitTreeState": "clean",
    "buildDate": "2023-11-15T16:58:22Z",
    "goVersion": "go1.20.11",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "kustomizeVersion": "v5.0.4-0.20230601165947-6ce0bf390ce3"
}

OS version

# On Linux:
$ cat /etc/os-release 
NAME="Fedora Linux"
VERSION="39.20231119.3.0 (CoreOS)"
ID=fedora
VERSION_ID=39
VERSION_CODENAME=""
PLATFORM_ID="platform:f39"
PRETTY_NAME="Fedora CoreOS 39.20231119.3.0"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:39"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=39
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=39
SUPPORT_END=2024-05-14
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='39.20231119.3.0'

$ uname -a
Linux k8s1 6.5.11-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov  8 22:37:57 UTC 2023 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

@plevart plevart added the kind/bug Categorizes issue or PR as related to a bug. label Dec 23, 2023
@plevart
Copy link
Author

plevart commented Dec 23, 2023

I can inform you that I tried building a CoreOS image where installation from cri-o 1.28.2 tarball skipped installing conmon. The end result was that podman is ok with that - it doesn't report "Error: conmon failed: exit status 1" any more. And cri-o is happy too. k8s on top of cri-o starts-up and all pods are green. So it seems cri-o works with conmon supplied by Fedora rpm package, but podman doesn't work with conmon supplied by cri-o tarball. I wonder why that would be?

@kwilczynski
Copy link
Member

@plevart, thank you for getting in touch, and I am sorry to hear you have issues!

To surmise the above:

  • You have a Fedora CoreOS (FCOS) image you want to customise
  • You prefer to upgrade the CRI-O version using a tarball release from GitHub over using an RPM package
  • When installed from a tarball, there is a problem where binaries from the tarball are causing problems
    • Podman and CRI-O would break

Would this be correct?

I also have a question: why do it this way? Why not use the RPM package? Any issues?

@plevart
Copy link
Author

plevart commented Dec 26, 2023

@kwilczynski ...almost correct. To answer your questions:

  • Yes, I have a base CoreOS image in the form of an OCI image and I want to add cri-o to it (among other things), then use the resulting image to provision k8s nodes. This works perfectly. Even the upgrade of k8s is simplified that way - I create another image with upgraded cri-o and k8s packages (kubeadm, kubelet, kubectl), then I just "rebase" k8s node(s), one by one, to that image, reboot and then "kubeadm upgrade" the nodes.
  • I create custom CoreOS images using cri-o tarball preferably because CoreOS is Fedora based OS, uses rpm(s) as packages but doesn't support rpm modules. cri-o is released as rpm modules for Fedora where each minor release is a separate stream. This is ok for Fedora OS, but installing such module on CoreOS involves too much hacking. I found tarball installation on CoreOS much more straightforward. At least until now when I found it has a conflicting conmon binary.
  • Yes, tarball installation overwrites conmon binary that is part of the base CoreOS install and that binary doesn't work with podman. cri-o works without issues though.

As to why not use the RPM package. I tried with RPM package released on the official k8s repo: https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.28/rpm/, but this package looks like it is not meant for Fedora as it includes conflicting files (conmon) and therefore fails to install on top of CoreOS base that already contains rpm package conmon. I can't de-install the conmon package as it is a dependency of many other packages that are part of CoreOS, including vital package rpm-ostree.
If there is non-modularized rpm cri-o package version available for Fedora, I'd like to know where it is published. Then I would switch to using that instead of tarball. Currently I use the following "hack" to selectively install files from tarball and this is good enough for now:

grep -v cni-plugins < install | grep -Ev 'conmon$' | PREFIX=/usr bash

@plevart
Copy link
Author

plevart commented Dec 26, 2023

Hint: would it be difficult for you to release Fedora rpm(s) in non-modularized repo(s) in addition to modules? The official k8s repositories structure already seems to support that. With one repo per minor release. This would finally enable straightforward installation on CoreOS. Either by package layering or by creating custom CoreOS images.

@haircommander
Copy link
Member

hey @plevart we are actually discussing doing just that with Brad Smith from the fedora community. We're working on propopsals for F40 to provide a package for kube, crictl and cri-o using non-modularized packages.

@haircommander
Copy link
Member

see https://fedoraproject.org/wiki/Changes/VersionedKubernetesPackages

@afbjorklund
Copy link
Contributor

afbjorklund commented Jan 7, 2024

Here was my workaround, to be able to run both podman and cri-o (OBS) on Fedora 39 (Cloud):

mv /usr/bin/conmon /usr/libexec/podman/

The packages still conflict, just that the fedora binaries are overwritten with the static nix binaries.

rpm -Uv --nodeps --replacefiles cri-o*.rpm

And the /etc/containers/policy.json is overwritten with the variant without the RedHat GPG keys...

-{
-    "default": [
-        {
-            "type": "insecureAcceptAnything"
-        }
-    ],
-    "transports": {
-        "docker": {
-	    "registry.access.redhat.com": [
-		{
-		    "type": "signedBy",
-		    "keyType": "GPGKeys",
-		    "keyPath": "/etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release"
-		}
-	    ],
-	    "registry.redhat.io": [
-		{
-		    "type": "signedBy",
-		    "keyType": "GPGKeys",
-		    "keyPath": "/etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release"
-		}
-	    ]
-	},
-        "docker-daemon": {
-	    "": [
-		{
-		    "type": "insecureAcceptAnything"
-		}
-	    ]
-	}
-    }
-}
+{ "default": [{ "type": "insecureAcceptAnything" }] }

mv /usr/bin/conmon /usr/libexec/crio/

Afterwards, the package database is a bit sad.

# rpm -V conmon containers-common crun
Unsatisfied dependencies for conmon-2:2.1.8-2.fc39.x86_64:
	conmon conflicts with (installed) cri-o-1.29.1-150500.1.1.x86_64
missing     /usr/bin/conmon (replaced)
S.5....T.    /usr/libexec/crio/conmon
Unsatisfied dependencies for containers-common-4:1-95.fc39.noarch:
	containers-common conflicts with (installed) cri-o-1.29.1-150500.1.1.x86_64
S.5......  c /etc/containers/policy.json
Unsatisfied dependencies for crun-1.12-1.fc39.x86_64:
	crun conflicts with (installed) cri-o-1.29.1-150500.1.1.x86_64
.........    /usr/bin/crun (replaced)

@afbjorklund
Copy link
Contributor

I can inform you that I tried building a CoreOS image where installation from cri-o 1.28.2 tarball skipped installing conmon. The end result was that podman is ok with that - it doesn't report "Error: conmon failed: exit status 1" any more. And cri-o is happy too. k8s on top of cri-o starts-up and all pods are green. So it seems cri-o works with conmon supplied by Fedora rpm package, but podman doesn't work with conmon supplied by cri-o tarball. I wonder why that would be?

If you run conmon with the arguments provided by podman --debug, you will see that the option parsing changed:

conmon: option parsing failed: Unknown option --network-backend

@haircommander
Copy link
Member

I've tried giving this a look, and I think it's going to be tricky. we basically have to hardcode the version of conmon, crun, and c/common in the cri-o bundle somehow, so the rpm can read it and set a Provides: conmon = $conmonVersion etc. Even if we do that, the podman rpm may have some strict expectations about what version of things to have.

Another option is we can separate the cri-o rpm from the cri-o-extras (or other name) rpm. A user who has access to their own conmon/crun/common could use those, but one who doesn't could install the cri-o-extras rpm. WDYT @saschagrunert ?

@afbjorklund
Copy link
Contributor

afbjorklund commented Jan 15, 2024

I thought that is why there were two versions of the conmon helper, one for podman and one cri-o?

It is somewhat unfortunate that the podman conmon (the one in system) installs itself under libexec/crio,
and does not install libexec/podman. But I guess that is something for the Fedora packagers to sort out.

And it makes sense, as long as you run the old crio in fedora and not the new crio from kubernetes

@haircommander
Copy link
Member

the conmons should be compatible so my hope is you could run either crio in this scheme

@vrutkovs
Copy link
Contributor

This is blocking OKD 4.16 update to CRI-O 1.29

@afbjorklund
Copy link
Contributor

afbjorklund commented Feb 5, 2024

Apparently the new version of conmon (2.1.10) doesn't work for the new version of podman (4.9.2), either.

Error: conmon failed: exit status 1

@afbjorklund
Copy link
Contributor

@plevart : did you file any issue with the podman project, regarding this issue ? I can't seem to find it.

I found out that the error was due to the network_config_dir now being empty, after netavark addition

@saschagrunert
Copy link
Member

The latest prerelease packages should be isolated from any system binary because we moved conmon, runc, crun and the policy.json into CRI-O owned files. Can you give those a try?

We have no official version release yet with the new changes, though.

@vrutkovs
Copy link
Contributor

vrutkovs commented Feb 8, 2024

Prerelease works perfectly for OKD, thanks!

@plevart
Copy link
Author

plevart commented Feb 8, 2024

@afbjorklund

I thought that is why there were two versions of the conmon helper, one for podman and one cri-o?

It is somewhat unfortunate that the podman conmon (the one in system) installs itself under libexec/crio,
and does not install libexec/podman. But I guess that is something for the Fedora packagers to sort out.

And it makes sense, as long as you run the old crio in fedora and not the new crio from kubernetes

The "podman conmon" is a Fedora RPM package (conmon-2.1.8-2.fc39.x86_64 in my original post) and installs it into /usr/bin/conmon. The cri-o I'm using is a tarball install and I use PREFIX=/usr while installing it so it also wants to install it to /usr/bin/conmon. I also tried installing the new k8s repo based cri-o RPM which has conflicts with Fedora conmon-2.1.8-2.fc39.x86_64 exactly in this path /usr/bin/conmon. So neither side tries to use libexec/crio or libexec/podman.

But cri-o tarball install can be hacked to not install conmon so the "podman" version can be used with cri-o. This is my current workaround.

I encourage future cri-o RPM package in k8s repo to use private version of conmon that would install into libexec/crio and use it from there regardless of what PATH points to. Please also test for possible other conflicts between cri-o RPM package from k8s repo and Fedora podman, skopeo, buildah, toolbox and its dependencies.

@afbjorklund
Copy link
Contributor

afbjorklund commented Feb 8, 2024

When I explicitly configured the location of the network_config_dir, then podman started working again.

I think it can end up as empty ("") now, due to the possibility of being able to use either of cni or netavark?

# Path to the directory where network configuration files are located.
# For the CNI backend the default is "/etc/cni/net.d" as root
# and "$HOME/.config/cni/net.d" as rootless.
# For the netavark backend "/etc/containers/networks" is used as root
# and "$graphroot/networks" as rootless.

@afbjorklund
Copy link
Contributor

afbjorklund commented Feb 8, 2024

Previously I ended up with an empty string, which caused an off-by-one in the conmon parameters

--exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark

So it sounds like a bug in Podman.

@plevart
Copy link
Author

plevart commented Mar 3, 2024

Hi,
I noticed that from cri-o 1.29.x onwards, the conmon executable is installed into /usr/bin/crio-conmon, so no more conflict with Fedora conmon. I can remove the hack that skips the installation of conmon from cri-o tarball. Thanks!

Do you happen to know whether this is also true for a cri-o RPM package published on k8s repos? I will check that myself tomorrow.

@haircommander
Copy link
Member

yup it also applies there!

Copy link

github-actions bot commented Apr 4, 2024

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2024
@kwilczynski
Copy link
Member

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2024
@kwilczynski
Copy link
Member

/assign saschagrunert
/assign haircommander
/assign kwilczynski

Copy link

github-actions bot commented Jun 2, 2024

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants