New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors inside Operator Pod #14011
Comments
The cluster looks healthy from the status you shared. A couple ideas...
|
The Installation of rook doesn't went well, I have to delete the mon-*-canary Pods/Deplyoments, because the mon pods get stuck in state pending. I reapply the common.yml but nothing happened, same error. [root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph get cm
NAME DATA AGE
ceph-file-controller-detect-version 3 148m
kube-root-ca.crt 1 155m
local-device-worker-01-de-nbg1-dc3 1 155m
local-device-worker-02-de-nbg1-dc3 1 155m
local-device-worker-03-de-nbg1-dc3 1 155m
local-device-worker-04-de-nbg1-dc3 1 155m
local-device-worker-05-de-nbg1-dc3 1 113m
local-device-worker-06-de-nbg1-dc3 1 113m
rook-ceph-csi-config 1 154m
rook-ceph-csi-detect-version 3 154m
rook-ceph-csi-mapping-config 1 154m
rook-ceph-detect-version 3 154m
rook-ceph-mon-endpoints 5 154m
rook-ceph-operator-config 33 155m
rook-ceph-pdbstatemap 2 148m
rook-config-override 1 154m
[root::bastion-01-de-nbg1-dc3]
~: |
The Can you share Try also a complete cleanup according to the cleanup guide and reinstall. |
now there are no Pods left with status pending, it was only at the bootstrap proccess. after i delete the mon canary deployments the main mon pods came up. |
Soo i tried it with a reinstall, same behaviour. This is the actual State when i apply my cluster.yaml [root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: kubectl --namespace rook-ceph get pods
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-2mfg7 2/2 Running 0 27s
csi-cephfsplugin-gvcfz 2/2 Running 0 27s
csi-cephfsplugin-pjdhw 2/2 Running 0 27s
csi-cephfsplugin-provisioner-d5f4bb8c4-bksjq 5/5 Running 0 27s
csi-cephfsplugin-provisioner-d5f4bb8c4-t7dr6 5/5 Running 0 27s
csi-cephfsplugin-skzt8 2/2 Running 0 27s
csi-cephfsplugin-vxh4z 2/2 Running 0 27s
csi-cephfsplugin-zc4qs 2/2 Running 0 27s
csi-rbdplugin-2mwnk 2/2 Running 0 27s
csi-rbdplugin-2z6jx 2/2 Running 0 27s
csi-rbdplugin-6mtcn 2/2 Running 0 27s
csi-rbdplugin-mhvrs 2/2 Running 0 27s
csi-rbdplugin-provisioner-b8f6cd9cf-qwl9v 5/5 Running 0 27s
csi-rbdplugin-provisioner-b8f6cd9cf-zkfcv 5/5 Running 0 27s
csi-rbdplugin-r8tqd 2/2 Running 0 27s
csi-rbdplugin-v9f24 2/2 Running 0 27s
rook-ceph-csi-detect-version-46rl8 0/1 Completed 0 45s
rook-ceph-detect-version-6bxs9 0/1 Completed 0 45s
rook-ceph-mon-a-557c6696f9-qvt5r 0/2 Pending 0 34s
rook-ceph-mon-a-canary-5b8d49c665-nb2xc 2/2 Running 0 38s
rook-ceph-mon-b-canary-6869cf4654-lvm6r 2/2 Running 0 38s
rook-ceph-mon-c-canary-6dcbcbd6d7-fnp45 2/2 Running 0 38s
rook-ceph-operator-757cdc49bb-dbwn7 1/1 Running 0 102s
rook-discover-bf2hb 1/1 Running 0 96s
rook-discover-hzztl 1/1 Running 0 96s
rook-discover-kwv62 1/1 Running 0 96s
rook-discover-lcgrs 1/1 Running 0 96s
rook-discover-s9n48 1/1 Running 0 96s
rook-discover-vlkxj 1/1 Running 0 96s
[root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: And this is what a describe says from the Pending Pod: [root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: kubectl --namespace rook-ceph describe pod rook-ceph-mon-a-557c6696f9-qvt5r
Name: rook-ceph-mon-a-557c6696f9-qvt5r
Namespace: rook-ceph
Priority: [2000001000](tel:2000001000)
Priority Class Name: system-node-critical
Service Account: rook-ceph-default
Node: <none>
Labels: app=rook-ceph-mon
app.kubernetes.io/component=cephclusters.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=a
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-mon
app.kubernetes.io/part-of=rook-ceph
ceph_daemon_id=a
ceph_daemon_type=mon
mon=a
mon_cluster=rook-ceph
pod-template-hash=557c6696f9
rook.io/operator-namespace=rook-ceph
rook_cluster=rook-ceph
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/rook-ceph-mon-a-557c6696f9
Init Containers:
chown-container-data-dir:
Image: quay.io/ceph/ceph:v18.2.2
Port: <none>
Host Port: <none>
Command:
chown
Args:
--verbose
--recursive
ceph:ceph
/var/log/ceph
/var/lib/ceph/crash
/run/ceph
/var/lib/ceph/mon/ceph-a
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/mon/ceph-a from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ns74g (ro)
init-mon-fs:
Image: quay.io/ceph/ceph:v18.2.2
Port: <none>
Host Port: <none>
Command:
ceph-mon
Args:
--fsid=9460fe9e-2a9d-476c-be66-24fffabcfab0
--keyring=/etc/ceph/keyring-store/keyring
--default-log-to-stderr=true
--default-err-to-stderr=true
--default-mon-cluster-log-to-stderr=true
--default-log-stderr-prefix=debug
--default-log-to-file=false
--default-mon-cluster-log-to-file=false
--mon-host=$(ROOK_CEPH_MON_HOST)
--mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)
--id=a
--setuser=ceph
--setgroup=ceph
--public-addr=10.233.62.5
--mkfs
Environment:
CONTAINER_IMAGE: quay.io/ceph/ceph:v18.2.2
POD_NAME: rook-ceph-mon-a-557c6696f9-qvt5r (v1:metadata.name)
POD_NAMESPACE: rook-ceph (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: node allocatable (limits.memory)
POD_MEMORY_REQUEST: 0 (requests.memory)
POD_CPU_LIMIT: node allocatable (limits.cpu)
POD_CPU_REQUEST: 0 (requests.cpu)
CEPH_USE_RANDOM_NONCE: true
ROOK_MSGR2: msgr2_false_encryption_false_compression_false
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/mon/ceph-a from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ns74g (ro)
Containers:
mon:
Image: quay.io/ceph/ceph:v18.2.2
Ports: 3300/TCP, 6789/TCP
Host Ports: 0/TCP, 0/TCP
Command:
ceph-mon
Args:
--fsid=9460fe9e-2a9d-476c-be66-24fffabcfab0
--keyring=/etc/ceph/keyring-store/keyring
--default-log-to-stderr=true
--default-err-to-stderr=true
--default-mon-cluster-log-to-stderr=true
--default-log-stderr-prefix=debug
--default-log-to-file=false
--default-mon-cluster-log-to-file=false
--mon-host=$(ROOK_CEPH_MON_HOST)
--mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)
--id=a
--setuser=ceph
--setgroup=ceph
--foreground
--public-addr=10.233.62.5
--setuser-match-path=/var/lib/ceph/mon/ceph-a/store.db
--public-bind-addr=$(ROOK_POD_IP)
Liveness: exec [env -i sh -c
outp="$(ceph --admin-daemon /run/ceph/ceph-mon.a.asok mon_status 2>&1)"
rc=$?
if [ $rc -ne 0 ]; then
echo "ceph daemon health check failed with the following output:"
echo "$outp" | sed -e 's/^/> /g'
exit $rc
fi
] delay=10s timeout=5s period=10s #success=1 #failure=3
Startup: exec [env -i sh -c
outp="$(ceph --admin-daemon /run/ceph/ceph-mon.a.asok mon_status 2>&1)"
rc=$?
if [ $rc -ne 0 ]; then
echo "ceph daemon health check failed with the following output:"
echo "$outp" | sed -e 's/^/> /g'
exit $rc
fi
] delay=10s timeout=5s period=10s #success=1 #failure=6
Environment:
CONTAINER_IMAGE: quay.io/ceph/ceph:v18.2.2
POD_NAME: rook-ceph-mon-a-557c6696f9-qvt5r (v1:metadata.name)
POD_NAMESPACE: rook-ceph (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: node allocatable (limits.memory)
POD_MEMORY_REQUEST: 0 (requests.memory)
POD_CPU_LIMIT: node allocatable (limits.cpu)
POD_CPU_REQUEST: 0 (requests.cpu)
CEPH_USE_RANDOM_NONCE: true
ROOK_MSGR2: msgr2_false_encryption_false_compression_false
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
ROOK_POD_IP: (v1:status.podIP)
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/mon/ceph-a from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ns74g (ro)
log-collector:
Image: quay.io/ceph/ceph:v18.2.2
Port: <none>
Host Port: <none>
Command:
/bin/bash
-x
-e
-m
-c
CEPH_CLIENT_ID=ceph-mon.a
PERIODICITY=daily
LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
LOG_MAX_SIZE=500M
ROTATE=7
# edit the logrotate file to only rotate a specific daemon log
# otherwise we will logrotate log files without reloading certain daemons
# this might happen when multiple daemons run on the same machine
sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"
# replace default daily with given user input
sed --in-place "s/daily/$PERIODICITY/g" "$LOG_ROTATE_CEPH_FILE"
# replace rotate count, default 7 for all ceph daemons other than rbd-mirror
sed --in-place "s/rotate 7/rotate $ROTATE/g" "$LOG_ROTATE_CEPH_FILE"
if [ "$LOG_MAX_SIZE" != "0" ]; then
# adding maxsize $LOG_MAX_SIZE at the 4th line of the logrotate config file with 4 spaces to maintain indentation
sed --in-place "4i \ \ \ \ maxsize $LOG_MAX_SIZE" "$LOG_ROTATE_CEPH_FILE"
fi
while true; do
# we don't force the logrorate but we let the logrotate binary handle the rotation based on user's input for periodicity and size
logrotate --verbose "$LOG_ROTATE_CEPH_FILE"
sleep 15m
done
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ns74g (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
rook-config-override:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: rook-config-override
ConfigMapOptional: <nil>
rook-ceph-mons-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-mons-keyring
Optional: false
ceph-daemons-sock-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/exporter
HostPathType: DirectoryOrCreate
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/crash
HostPathType:
ceph-daemon-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/mon-a/data
HostPathType:
kube-api-access-ns74g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=worker-04-de-nbg1-dc3
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m18s default-scheduler 0/9 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) didn't match Pod's node affinity/selector. preemption: 0/9 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 8 Preemption is not helpful for scheduling..
Warning FailedScheduling 2m8s default-scheduler 0/9 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) didn't match Pod's node affinity/selector. preemption: 0/9 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules, 8 Preemption is not helpful for scheduling..
[root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: And this are the logs from the Pod: [root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: kubectl --namespace rook-ceph logs pod/rook-ceph-mon-a-557c6696f9-qvt5r
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
[root::bastion-01-de-nbg1-dc3]
~/rook/deploy/examples: When i delete the mon a, b, c canary Deployments, Statefulsets and Pods the Pod is coming up and the Installation is successfully, but the Errors inside the Operator Pods are the same. |
@pomland-94 |
Yes, because they are not deleted automaticly. |
I think you can watch canary logs why it is not deleted |
This are the logs from the Canary Pods: [root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph logs pod/rook-ceph-mon-a-canary-5b8d49c665-nb2xc
Defaulted container "mon" out of: mon, log-collector
[root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph logs pod/rook-ceph-mon-b-canary-6869cf4654-lvm6r
Defaulted container "mon" out of: mon, log-collector
[root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph logs pod/rook-ceph-mon-c-canary-6dcbcbd6d7-fnp45
Defaulted container "mon" out of: mon, log-collector
[root::bastion-01-de-nbg1-dc3]
~: |
This sounds like an issue with the Kubernetes platform to me. I also notice that you didn't fill out the full issue template questions that request environment information. Please add the missing info:
|
I Fill out some of the Infos: How to reproduce it (minimal and precise):
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
|
Anyone here with an idea? |
Hmm ok, maybe there is no solution to fix this? So in my company we go away from rook and look for another Storage Solutions with the features of rook. |
@pomland-94 Did you try in any other environments? Even in minikube to see if it will work for you? These environmental issues are difficult for us to troubleshoot since we cannot reproduce it in our environments. |
it looks like the configmap is held by some finalizers and not allowing to be deleted. @pomland-94 can you please share the below details? and also lets not delete any resources manually because Rook handles all the deletion part of the resources its created. Ensure you try it on a new machine or cleanup the existing machine as part Rook doc.
|
This are other Logs from the Jobs: [root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph describe jobs/rook-ceph-csi-detect-version
Name: rook-ceph-csi-detect-version
Namespace: rook-ceph
Selector: batch.kubernetes.io/controller-uid=54c42112-562e-4f63-87e9-8fe95521fafc
Labels: app=rook-ceph-csi-detect-version
rook-version=v1.14.2
Annotations: <none>
Controlled By: Deployment/rook-ceph-operator
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Tue, 23 Apr 2024 20:11:36 +0200
Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=rook-ceph-csi-detect-version
batch.kubernetes.io/controller-uid=54c42112-562e-4f63-87e9-8fe95521fafc
batch.kubernetes.io/job-name=rook-ceph-csi-detect-version
controller-uid=54c42112-562e-4f63-87e9-8fe95521fafc
job-name=rook-ceph-csi-detect-version
rook-version=v1.14.2
Service Account: rook-ceph-system
Init Containers:
init-copy-binaries:
Image: rook/ceph:v1.14.2
Port: <none>
Host Port: <none>
Command:
cp
Args:
--archive
--force
--verbose
/usr/local/bin/rook
/rook/copied-binaries
Environment: <none>
Mounts:
/rook/copied-binaries from rook-copied-binaries (rw)
Containers:
cmd-reporter:
Image: quay.io/cephcsi/cephcsi:v3.11.0
Port: <none>
Host Port: <none>
Command:
/rook/copied-binaries/rook
Args:
cmd-reporter
--command
{"cmd":["cephcsi"],"args":["--version"]}
--config-map-name
rook-ceph-csi-detect-version
--namespace
rook-ceph
Environment: <none>
Mounts:
/rook/copied-binaries from rook-copied-binaries (rw)
Volumes:
rook-copied-binaries:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events: <none>
[root::bastion-01-de-nbg1-dc3]
~: [root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph describe jobs/rook-ceph-detect-version
Name: rook-ceph-detect-version
Namespace: rook-ceph
Selector: batch.kubernetes.io/controller-uid=dcfa5441-f5fa-443d-b91b-d3455ec471e0
Labels: app=rook-ceph-detect-version
rook-version=v1.14.2
Annotations: <none>
Controlled By: CephCluster/rook-ceph
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Tue, 23 Apr 2024 20:11:36 +0200
Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=rook-ceph-detect-version
batch.kubernetes.io/controller-uid=dcfa5441-f5fa-443d-b91b-d3455ec471e0
batch.kubernetes.io/job-name=rook-ceph-detect-version
controller-uid=dcfa5441-f5fa-443d-b91b-d3455ec471e0
job-name=rook-ceph-detect-version
rook-version=v1.14.2
Service Account: rook-ceph-cmd-reporter
Init Containers:
init-copy-binaries:
Image: rook/ceph:v1.14.2
Port: <none>
Host Port: <none>
Command:
cp
Args:
--archive
--force
--verbose
/usr/local/bin/rook
/rook/copied-binaries
Environment: <none>
Mounts:
/rook/copied-binaries from rook-copied-binaries (rw)
Containers:
cmd-reporter:
Image: quay.io/ceph/ceph:v18.2.2
Port: <none>
Host Port: <none>
Command:
/rook/copied-binaries/rook
Args:
cmd-reporter
--command
{"cmd":["ceph"],"args":["--version"]}
--config-map-name
rook-ceph-detect-version
--namespace
rook-ceph
Environment: <none>
Mounts:
/rook/copied-binaries from rook-copied-binaries (rw)
Volumes:
rook-copied-binaries:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events: <none>
[root::bastion-01-de-nbg1-dc3]
~: [root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph logs job/rook-ceph-csi-detect-version -f
Defaulted container "cmd-reporter" out of: cmd-reporter, init-copy-binaries (init)
2024/04/23 18:11:37 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
2024-04-23 18:11:37.890788 I | job-reporter-cmd: running command: /usr/local/bin/cephcsi cephcsi --version
Cephcsi Version: v3.11.0
Git Commit: bc24b5eca87626d690a29effa9d7420cc0154a7a
Go Version: go1.21.5
Compiler: gc
Platform: linux/arm64
Kernel: 6.1.0-18-arm64
[root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph logs job/rook-ceph-detect-version -f
Defaulted container "cmd-reporter" out of: cmd-reporter, init-copy-binaries (init)
2024/04/23 18:11:38 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
2024-04-23 18:11:38.203899 I | job-reporter-cmd: running command: /usr/bin/ceph ceph --version
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
[root::bastion-01-de-nbg1-dc3]
~: |
[root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph get cm/rook-ceph-csi-detect-version -o yaml
apiVersion: v1
data:
retcode: "0"
stderr: ""
stdout: |
Cephcsi Version: v3.11.0
Git Commit: bc24b5eca87626d690a29effa9d7420cc0154a7a
Go Version: go1.21.5
Compiler: gc
Platform: linux/arm64
Kernel: 6.1.0-18-arm64
kind: ConfigMap
metadata:
creationTimestamp: "2024-04-23T18:11:37Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2024-04-23T18:11:37Z"
finalizers:
- foregroundDeletion
labels:
app: rook-cmd-reporter
name: rook-ceph-csi-detect-version
namespace: rook-ceph
resourceVersion: "9191"
uid: ddd3fcc3-bf7d-4a13-b5b9-ec3f9ba37a44
[root::bastion-01-de-nbg1-dc3]
~: kubectl --namespace rook-ceph get cm/rook-ceph-detect-version -o yaml
apiVersion: v1
data:
retcode: "0"
stderr: ""
stdout: |
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
kind: ConfigMap
metadata:
creationTimestamp: "2024-04-23T18:11:38Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2024-04-23T18:11:38Z"
finalizers:
- foregroundDeletion
labels:
app: rook-cmd-reporter
name: rook-ceph-detect-version
namespace: rook-ceph
resourceVersion: "9270"
uid: 1031113a-b480-4571-ada8-e9cdb94b1eef
[root::bastion-01-de-nbg1-dc3]
~: |
The resources all have the finalizer:
Do you have some policy in your cluster that would add this finalizer? Can you disable that policy to see if it fixes the issue? |
It‘s a good question, i installed my kubespray cluster with the following hardening values. # Hardening
---
## kube-apiserver
authorization_modes: ["Node", "RBAC"]
# AppArmor-based OS
kube_apiserver_feature_gates: ["AppArmor=true"]
kube_apiserver_request_timeout: 120s
kube_apiserver_service_account_lookup: true
# enable kubernetes audit
kubernetes_audit: true
audit_log_path: "/var/log/kube-apiserver-log.json"
audit_policy_file: "{{ kube_config_dir }}/audit-policy/apiserver-audit-policy.yaml"
audit_log_maxage: 30
audit_log_maxbackups: 10
audit_log_maxsize: 100
tls_min_version: VersionTLS12
tls_cipher_suites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
# enable encryption at rest
kube_encrypt_secret_data: true
kube_encryption_resources: [secrets]
kube_encryption_algorithm: "secretbox"
kube_apiserver_enable_admission_plugins:
- EventRateLimit
- AlwaysPullImages
- ServiceAccount
- NamespaceLifecycle
- NodeRestriction
- LimitRanger
- ResourceQuota
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- PodNodeSelector
- PodSecurity
kube_apiserver_admission_control_config_file: true
# EventRateLimit plugin configuration
kube_apiserver_admission_event_rate_limits:
limit_1:
type: Namespace
qps: 50
burst: 100
cache_size: 2000
limit_2:
type: User
qps: 50
burst: 100
kube_profiling: true
## kube-controller-manager
# kube_controller_manager_bind_address: 127.0.0.1
kube_controller_terminated_pod_gc_threshold: 50
# AppArmor-based OS
# kube_controller_feature_gates: ["RotateKubeletServerCertificate=true"]
kube_controller_feature_gates: ["RotateKubeletServerCertificate=true", "AppArmor=true"]
## kube-scheduler
# kube_scheduler_bind_address: 127.0.0.1
# AppArmor-based OS
kube_scheduler_feature_gates: ["AppArmor=true"]
## etcd
etcd_deployment_type: kubeadm
## kubelet
kubelet_authorization_mode_webhook: true
kubelet_authentication_token_webhook: true
kube_read_only_port: 0
kubelet_rotate_server_certificates: false
kubelet_csr_approver_values:
bypassHostnameCheck: true
bypassDnsResolution: true
kubelet_protect_kernel_defaults: true
kubelet_event_record_qps: 1
kubelet_rotate_certificates: true
kubelet_streaming_connection_idle_timeout: "5m"
kubelet_make_iptables_util_chains: true
kubelet_feature_gates: ["RotateKubeletServerCertificate=true", "SeccompDefault=true"]
kubelet_seccomp_default: true
kubelet_systemd_hardening: false
# In case you have multiple interfaces in your
# control plane nodes and you want to specify the right
# IP addresses, kubelet_secure_addresses allows you
# to specify the IP from which the kubelet
# will receive the packets.
# kubelet_secure_addresses: "192.168.10.110 192.168.10.111 192.168.10.112"
# additional configurations
kube_owner: root
kube_cert_group: root
# create a default Pod Security Configuration and deny running of insecure pods
# kube_system namespace is exempted by default
kube_pod_security_use_default: true
kube_pod_security_default_enforce: restricted
kube_pod_security_exemptions_namespaces:
- kube-system
- calico-apiserver
- metrics-server
- rook-ceph
- prometheus |
The setting causing the foreground deletion is not obvious in that config. Background deletion is the K8s default, and somehow foreground is now being enforced. The topic on Cascading deletion may have some clues. |
so i have to create the deployment with the following Option? kubectl apply -f crds.yaml -f common.yaml -f operator.yaml --cascade=foreground
kubectl apply -f cluster.yaml --cascade=foreground
When i interpret this article right? |
No that won't help, whatever policy is causing this foreground deletion policy will still apply when the rook operator creates its resources. You need to find the policy that is causing the foreground policy and disable it so the default background policy is restored. |
I can't find the policy, I searched my whole cluster but can't find anything. I also tried to create the following cluster binding apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default-cascade-delete
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:controller:clusterrole-aggregation-controller
subjects:
- kind: ServiceAccount
name: rook-ceph-default
namespace: rook-ceph but it seems that this not help. |
Is this a bug report or feature request?
Deviation from expected behavior:
Errors inside the Operator Pod
Expected behavior:
Inside the Operator pod I get lots of errors every second:
How to reproduce it (minimal and precise):
Installed rook with this commands from the tar.gz Archive
$ git clone --single-branch --branch v1.13.7 https://github.com/rook/rook.git cd rook/deploy/examples kubectl create -f crds.yaml -f common.yaml -f operator.yaml kubectl create -f cluster.yaml
cluster.yaml
, if necessaryCluster Status to submit:
The text was updated successfully, but these errors were encountered: