Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible operator-sdk v1.5.0 with updated kube-rbac-proxy:v0.8.0 fails to run with permission denied #4684

Closed
slopezz opened this issue Mar 19, 2021 · 14 comments
Assignees
Labels
language/ansible Issue is related to an Ansible operator project triage/support Indicates an issue that is a support question.
Milestone

Comments

@slopezz
Copy link

slopezz commented Mar 19, 2021

Bug Report

After upgrading an ansible operator to operator-sdk v1.5.0, operator controller-manager pod never gets running because of error on kube-rbac-proxy container (which on operator-sdk v1.5.0 has been upgraded from v0.5.0 to v0.8.0):

Generated from kubelet on ip-10-96-4-241.ec2.internal
Error: container create failed: time="2021-03-19T15:51:12Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

I've seen some issues related to that in go (#4402, kubernetes-sigs/kubebuilder#1978), for example on go operators v1.5.0, the scaffolding is:

  • Adding to Dockerfile USER 65532:65532
  • Adding securityContext: allowPrivilegeEscalation: false to both proxy and manager containers
securityContext:
  allowPrivilegeEscalation: false

But on go operators v1.5.0, where this error does not appear, it is still being used kube-rbac-proxy:v0.5.0.

What did you do?

Create operator-sdk scaffolding using operator-sdk v1.5.0.

What did you expect to see?

Controller-manager containers running OK.

What did you see instead? Under which circumstances?

Controller-manager kube-rbac-proxy container failing because of error:

Generated from kubelet on ip-10-96-4-241.ec2.internal
Error: container create failed: time="2021-03-19T15:51:12Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

Environment

Operator type:

/language ansible

Kubernetes cluster type: Openshift v4.7.0

$ operator-sdk version

operator-sdk version: "v1.5.0", commit: "98f30d59ade2d911a7a8c76f0169a7de0dec37a0", kubernetes version: "1.19.4", go version: "go1.15.5", GOOS: "linux", GOARCH: "amd64"

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-dispatcher", GitCommit:"fd22db44e150011eccc8729db223945384460143", GitTreeState:"clean", BuildDate:"2020-07-24T07:27:52Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0+bd9e442", GitCommit:"bd9e4421804c212e6ac7ee074050096f08dda543", GitTreeState:"clean", BuildDate:"2021-02-11T23:05:38Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

I tried to update container securityContext without success, error persisted.

Finally I have solved this error by downgrading kube-rbac-proxy from v0.8.0 to v0.5.0 (actually, go operator-sdk v1.5.0 stays with v0.5.0, it seems that only ansible-operator v1.5.0 has upgraded to v0.8.0 introducing the bug).

Additional context

@estroz
Copy link
Member

estroz commented Mar 19, 2021

FWIW, the default Go operator project for v1.5.0 uses kube-rbac-proxy v0.8.0, and all operator types pass CI.

Does this happen with a newly-initialized operator, or to some operator that is trying to upgrade?
Can you try adding runAsNonRoot: true (which is currently in master to your Deployment like so

# config/manager/manager.yaml

spec:
  selector:
    matchLabels:
      control-plane: controller-manager
  replicas: 1
  template:
    metadata:
      labels:
        control-plane: controller-manager
    spec:
+     securityContext:
+       runAsNonRoot: true
      containers

/triage support

@openshift-ci-robot openshift-ci-robot added the triage/support Indicates an issue that is a support question. label Mar 19, 2021
@slopezz
Copy link
Author

slopezz commented Mar 22, 2021

FWIW, the default Go operator project for v1.5.0 uses kube-rbac-proxy v0.8.0, and all operator types pass CI.

Does this happen with a newly-initialized operator, or to some operator that is trying to upgrade?
Can you try adding runAsNonRoot: true (which is currently in master to your Deployment like so

# config/manager/manager.yaml

spec:
  selector:
    matchLabels:
      control-plane: controller-manager
  replicas: 1
  template:
    metadata:
      labels:
        control-plane: controller-manager
    spec:
+     securityContext:
+       runAsNonRoot: true
      containers

/triage support

Hi @estroz This is an ansible operator which I'm upgrading from operator-sdk v0.18.1 to v1.5.0 using the recommended migration path of initializing a project from scratch, and I'm using Openshift 4.7, which has more security restrictions than vanilla kubernetes, I've checked that it seems you are using kind on the ci?

I have tested what you suggested of adding that specific security context and didnt work:

$ git diff
diff --git a/config/default/manager_auth_proxy_patch.yaml b/config/default/manager_auth_proxy_patch.yaml
index 58dade9..92e80ff 100644
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
     spec:
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"
diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index fb3d02a..9409fc7 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -23,6 +23,8 @@ spec:
         control-plane: controller-manager
     spec:
       serviceAccountName: controller-manager
+      securityContext:
+        runAsNonRoot: true
       containers:
         - name: manager
           args:

Then deploy the changes:

$ make deploy 
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system unchanged
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net unchanged
serviceaccount/prometheus-exporter-operator-controller-manager unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding unchanged
service/prometheus-exporter-operator-controller-manager-metrics-service unchanged
deployment.apps/prometheus-exporter-operator-controller-manager configured
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor unchanged

And the error appears:
image

Regarding what I said about go-operator v1.5.0 and kube-rbac-proxy:v0.8.0, sorry for the confusion, we have just upgraded a go operator to v1.5.0 , and we forgot to upgrade the kube-rbac-proxy.

@slopezz
Copy link
Author

slopezz commented Mar 22, 2021

@estroz Quick update with more details, we have tested the go operator-sdk v1.5.0 and kube-rbac-proxy:v0.8.0 I was referring on previous message (a part from the ansible operator) and:

  • On OCP 4.5.7 it works
  • On OCP 4.6.17 it fails:
Events:
  Type     Reason          Age               From                                  Message
  ----     ------          ----              ----                                  -------
  Normal   Scheduled       <unknown>                                               Successfully assigned roi-test/marin3r-controller-manager-86857586df-59sfh to ip-10-96-7-225.ec2.internal
  Normal   AddedInterface  32s               multus                                Add eth0 [10.129.2.86/23]
  Normal   Pulling         32s               kubelet, ip-10-96-7-225.ec2.internal  Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
  Normal   Pulled          29s               kubelet, ip-10-96-7-225.ec2.internal  Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 2.728065908s
  Warning  Failed          28s               kubelet, ip-10-96-7-225.ec2.internal  Error: container create failed: time="2021-03-22T10:50:24Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulling         28s               kubelet, ip-10-96-7-225.ec2.internal  Pulling image "quay.io/3scale/marin3r:v0.8.0-dev.4"
  Normal   Pulled          25s               kubelet, ip-10-96-7-225.ec2.internal  Successfully pulled image "quay.io/3scale/marin3r:v0.8.0-dev.4" in 3.201375208s
  Normal   Created         25s               kubelet, ip-10-96-7-225.ec2.internal  Created container manager
  Normal   Started         25s               kubelet, ip-10-96-7-225.ec2.internal  Started container manager
  Warning  Failed          24s               kubelet, ip-10-96-7-225.ec2.internal  Error: container create failed: time="2021-03-22T10:50:29Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          23s               kubelet, ip-10-96-7-225.ec2.internal  Error: container create failed: time="2021-03-22T10:50:30Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulled          7s (x3 over 24s)  kubelet, ip-10-96-7-225.ec2.internal  Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Warning  Failed          7s                kubelet, ip-10-96-7-225.ec2.internal  Error: container create failed: time="2021-03-22T10:50:46Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  • On OCP 4.7.0 it fails:
Events:
  Type     Reason          Age               From                                   Message
  ----     ------          ----              ----                                   -------
  Normal   Scheduled       <unknown>                                                Successfully assigned roi-test/marin3r-controller-manager-86857586df-4n666 to ip-10-96-11-248.ec2.internal
  Normal   AddedInterface  36s               multus                                 Add eth0 [10.128.3.199/23]
  Warning  Failed          36s               kubelet, ip-10-96-11-248.ec2.internal  Error: container create failed: time="2021-03-22T10:44:53Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulling         36s               kubelet, ip-10-96-11-248.ec2.internal  Pulling image "quay.io/3scale/marin3r:v0.8.0-dev.4"
  Normal   Pulled          32s               kubelet, ip-10-96-11-248.ec2.internal  Successfully pulled image "quay.io/3scale/marin3r:v0.8.0-dev.4" in 3.954517265s
  Normal   Created         32s               kubelet, ip-10-96-11-248.ec2.internal  Created container manager
  Normal   Started         32s               kubelet, ip-10-96-11-248.ec2.internal  Started container manager
  Warning  Failed          31s               kubelet, ip-10-96-11-248.ec2.internal  Error: container create failed: time="2021-03-22T10:44:58Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          30s               kubelet, ip-10-96-11-248.ec2.internal  Error: container create failed: time="2021-03-22T10:44:59Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          19s               kubelet, ip-10-96-11-248.ec2.internal  Error: container create failed: time="2021-03-22T10:45:10Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulled          4s (x5 over 36s)  kubelet, ip-10-96-11-248.ec2.internal  Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Warning  Failed          4s                kubelet, ip-10-96-11-248.ec2.internal  Error: container create failed: time="2021-03-22T10:45:25Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

roivaz added a commit to 3scale-ops/marin3r that referenced this issue Mar 22, 2021
There seems to be a problem with 0.8.0 in openshift 4.6 and 4.7 where
the rabc-proxy fails to start. Reverting to 0.5.0 while this issue is
analyzed. Check
operator-framework/operator-sdk#4684 for more
details.

This reverts commit 6c4c763.
@camilamacedo86
Copy link
Contributor

It probably will be solved with #4655 (master).
See that it will use the service account and also will have the SCC config. See: https://github.com/operator-framework/operator-sdk/blob/master/testdata/ansible/memcached-operator/config/manager/manager.yaml#L36-L37

roivaz added a commit to 3scale-ops/marin3r that referenced this issue Mar 22, 2021
There seems to be a problem with 0.8.0 in openshift 4.6 and 4.7 where
the rabc-proxy fails to start. Reverting to 0.5.0 while this issue is
analyzed. Check
operator-framework/operator-sdk#4684 for more
details.

This reverts commit 6c4c763.
@slopezz
Copy link
Author

slopezz commented Mar 22, 2021

@camilamacedo86 Outputs from my previous comment #4684 (comment) refer to a go-operator v1.5.0 already using this pod and container securityContext, as well as the kube-rbac-proxy v0.8.0.

In addition, I have just tested that changes on the ansible operator (using its own serviceaccount), without success, same error:

$ git diff
diff --git a/config/default/manager_auth_proxy_patch.yaml b/config/default/manager_auth_proxy_patch.yaml
index 58dade9..92e80ff 100644
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
     spec:
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"
diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index fb3d02a..36b82c3 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -23,6 +23,8 @@ spec:
         control-plane: controller-manager
     spec:
       serviceAccountName: controller-manager
+      securityContext:
+        runAsNonRoot: true
       containers:
         - name: manager
           args:
@@ -36,6 +38,8 @@ spec:
                 fieldRef:
                   fieldPath: metadata.annotations['olm.targetNamespaces']
           image: controller:latest
+          securityContext:
+            allowPrivilegeEscalation: false
           livenessProbe:
             httpGet:
               path: /healthz


$ make deploy
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system unchanged
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net unchanged
serviceaccount/prometheus-exporter-operator-controller-manager unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role unchanged
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding unchanged
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding unchanged
service/prometheus-exporter-operator-controller-manager-metrics-service unchanged
deployment.apps/prometheus-exporter-operator-controller-manager configured
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor unchanged


$ oc get pods -n prometheus-exporter-operator-system
NAME                                                              READY   STATUS                 RESTARTS   AGE
prometheus-exporter-operator-controller-manager-5d8d8f69bflzl5q   2/2     Running                0          5h6m  # the one with v0.5.0
prometheus-exporter-operator-controller-manager-68588876878thxk   1/2     CreateContainerError   0          58s # new one


$ oc describe pod prometheus-exporter-operator-controller-manager-68588876878thxk -n prometheus-exporter-operator-system
...
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       3m38s                  default-scheduler  Successfully assigned prometheus-exporter-operator-system/prometheus-exporter-operator-controller-manager-68588876878thxk to ip-10-96-11-248.ec2.internal
  Normal   Started         3m36s                  kubelet            Started container manager
  Normal   AddedInterface  3m36s                  multus             Add eth0 [10.128.2.31/23]
  Warning  Failed          3m36s                  kubelet            Error: container create failed: time="2021-03-22T15:12:47Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulled          3m36s                  kubelet            Container image "quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11" already present on machine
  Normal   Created         3m36s                  kubelet            Created container manager
  Warning  Failed          3m35s                  kubelet            Error: container create failed: time="2021-03-22T15:12:48Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          3m34s                  kubelet            Error: container create failed: time="2021-03-22T15:12:49Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          3m23s                  kubelet            Error: container create failed: time="2021-03-22T15:13:00Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          3m9s                   kubelet            Error: container create failed: time="2021-03-22T15:13:14Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          2m57s                  kubelet            Error: container create failed: time="2021-03-22T15:13:26Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          2m45s                  kubelet            Error: container create failed: time="2021-03-22T15:13:38Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          2m31s                  kubelet            Error: container create failed: time="2021-03-22T15:13:52Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Warning  Failed          2m19s                  kubelet            Error: container create failed: time="2021-03-22T15:14:04Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"
  Normal   Pulled          111s (x11 over 3m36s)  kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" already present on machine
  Warning  Failed          111s (x2 over 2m6s)    kubelet            (combined from similar events): Error: container create failed: time="2021-03-22T15:14:32Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

@estroz
Copy link
Member

estroz commented Mar 30, 2021

@slopezz try setting the user/group in the kube-rbac-proxy container

# config/default/manager_auth_proxy_patch.yaml

         ports:
         - containerPort: 8443
           name: https
+        securityContext:
+          runAsUser: 65532
+          runAsGroup: 65534
       - name: manager
         args:
         - "--health-probe-bind-address=:8081"

Got this from brancz/kube-rbac-proxy#101

@andrewazores
Copy link

@slopezz try setting the user/group in the kube-rbac-proxy container

# config/default/manager_auth_proxy_patch.yaml

         ports:
         - containerPort: 8443
           name: https
+        securityContext:
+          runAsUser: 65532
+          runAsGroup: 65534
       - name: manager
         args:
         - "--health-probe-bind-address=:8081"

Got this from brancz/kube-rbac-proxy#101

I've tried this and see a failure with this message on the ReplicaSet:

  status:
    conditions:
    - lastTransitionTime: "2021-03-31T20:00:23Z"
      message: 'pods "container-jfr-operator-controller-manager-79b689cf47-" is forbidden:
        unable to validate against any security context constraint: [spec.containers[0].securityContext.runAsUser:
        Invalid value: 65532: must be in the ranges: [1000610000, 1000619999]]'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure

I'm using CRC for testing, so:

$ crc version
CodeReady Containers version: 1.24.0+5f06e84b
OpenShift version: 4.7.2 (embedded in executable)

@estroz
Copy link
Member

estroz commented Mar 31, 2021

@andrewazores you may actually need to use this OCP image instead of the "upstream" one. Additionally looks like that image uses user 65534 instead of 65532.

@andrewazores
Copy link

andrewazores commented Mar 31, 2021

I have just tried with that alternate image but see the same "out of range" failure I posted above. From what other reading I have done this appears to be due to security constraints applied within my cluster, although I'm unsure if that's CRC-specific or from the general OpenShift version installed.

@slopezz
Copy link
Author

slopezz commented Apr 8, 2021

@estroz I have tested the image that you suggested openshift4/ose-kube-rbac-proxy:v4.7.0 with the securitycontext suggested changes (but without forcing anmy specific user 65534/65532 which is not permitted on openshift) , and it works as expected, there is no error:

$ git diff
diff --git a/config/default/manager_auth_proxy_patch.yaml b/config/default/manager_auth_proxy_patch.yaml
index 58dade9..bc70b8b 100644
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
     spec:
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: registry.redhat.io/openshift4/ose-kube-rbac-proxy:v4.7.0
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"
diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index fb3d02a..36b82c3 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -23,6 +23,8 @@ spec:
         control-plane: controller-manager
     spec:
       serviceAccountName: controller-manager
+      securityContext:
+        runAsNonRoot: true
       containers:
         - name: manager
           args:
@@ -36,6 +38,8 @@ spec:
                 fieldRef:
                   fieldPath: metadata.annotations['olm.targetNamespaces']
           image: controller:latest
+          securityContext:
+            allowPrivilegeEscalation: false
           livenessProbe:
             httpGet:
               path: /healthz


$ make deploy 
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0-alpha.11
/home/slopez/bin/kustomize build config/default | kubectl apply -f -
namespace/prometheus-exporter-operator-system created
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net created
serviceaccount/prometheus-exporter-operator-controller-manager created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding created
service/prometheus-exporter-operator-controller-manager-metrics-service created
deployment.apps/prometheus-exporter-operator-controller-manager created
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor created

$ oc get pods -n prometheus-exporter-operator-system
NAME                                                              READY   STATUS    RESTARTS   AGE
prometheus-exporter-operator-controller-manager-77f956bd4785sgl   2/2     Running   0          91s

The problem is, that my operator is intended to run on both vanilla k8s and openshift, and this registry registry.redhat.io requires authentication, inside openshift it works without any required change, but outside ocp not (example of a random vm):

# docker run registry.redhat.io/openshift4/ose-kube-rbac-proxy:v4.7.0
Unable to find image 'registry.redhat.io/openshift4/ose-kube-rbac-proxy:v4.7.0' locally
docker: Error response from daemon: Get https://registry.redhat.io/v2/openshift4/ose-kube-rbac-proxy/manifests/v4.7.0: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication

@sqqqrly
Copy link

sqqqrly commented Jun 8, 2021

I also hit this error using a helm-based operator.

Using either of these two images worked. v0.8.0 fails with the nonroot error. Note that v4.7.0 generated an image pull error. v4.7 is what is specified in the tutorial at [1]. It works for me.

     # config/default/manager_auth_proxy_patch.yaml
  10     spec:
  11       containers:
  12       - name: kube-rbac-proxy
  13         #image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
  14         image: registry.redhat.io/openshift4/ose-kube-rbac-proxy:v4.7

The tutorial [1] also says to use FROM registry.redhat.io/openshift4/ose-helm-operator:v4.7 in the Dockerfile. This fails for me with unknown flag: --health-probe-bind-address. FROM quay.io/operator-framework/helm-operator:v1.8.0 works.

[1] https://docs.openshift.com/container-platform/4.7/operators/operator_sdk/helm/osdk-helm-tutorial.html#osdk-prepare-supported-images_osdk-helm-tutorial

@tlwu2013
Copy link
Contributor

The tutorial [1] also says to use FROM registry.redhat.io/openshift4/ose-helm-operator:v4.7 in the Dockerfile. This fails for me with unknown flag: --health-probe-bind-address. FROM quay.io/operator-framework/helm-operator:v1.8.0 works.

hey @sqqqrly, I assume you are using SDK 1.6.0+, can you confirm?

The registry.redhat.io/openshift4/ose-helm-operator:v4.7 is a downstream image based on helm-operator:v1.3.0. Therefore it doesn't work with the arg --health-probe-bind-address being scaffolded by the later SDK version.

@sqqqrly
Copy link

sqqqrly commented Jun 16, 2021

╰─➤  operator-sdk version
operator-sdk version: "v1.8.0", commit: "d3bd87c6900f70b7df618340e1d63329c7cd651e", kubernetes version: "1.20.2", go version: "go1.16.4", GOOS: "linux", GOARCH: "amd64"

Thanks for the health probe note. I am hitting that and now I know why.

@camilamacedo86
Copy link
Contributor

Hi @slopezz,

The downstream repo is now updated with this tag version, see:

https://github.com/openshift/ocp-release-operator-sdk/blob/master/testdata/go/v3/memcached-operator/config/default/manager_auth_proxy_patch.yaml#L13

Note that it has mock projects which are tested against OCP. In this way, that ensures that is working on OCP as well.

Then, the next downstream release will be using its latest version as well. So, IMHO it seems that can be closed.

I will close this one, however, @slopezz if you face any issue with the next downstream release for 4.8 could you please raise a new issue and add the steps performed for we are able to reproduce it? Also, it might better fit via Bugzilla since seen a specific vendor issue and not part of the upstream scope at all.

c/c @fabianvf @jmrodri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/ansible Issue is related to an Ansible operator project triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

9 participants