New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVC creation pending due to deadline exception on external cluster using official 1.10 rook docs which remain after apply a fix to issue 8696 #11347
Comments
A few questions / comments: 1- If I want to leverage an external ceph cluster so that rook can provide Kubernetes the capability to use SC, PVC and PV. I should be using cluster-external.yaml. (I ask because I'm unclear what cluster-external-management.yaml does management of the cluster, beyond just consuming it). I get different errors going down that approach but I'm under the impression it's not necessary 2- When extracting the config from the source cluster, I am not specifying the RGW endpoint. Might that be the cause? |
Please share the logs and describe of pvc.
Cluster-mangement is not necessary
Letme know what exactly you want and you are not able to get it, Then I can help you in investegation
That for rgw specific won't be a problem... |
Hello, I have the same problem. Also pending when I try to create PVC Client Version: 4.10.0-0.okd-2022-04-23-131357 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
There are no error logs in operator: 2023-01-25 21:09:11.343453 W | ceph-spec: waiting for connection info of the external cluster. retrying in 1m0s. In project rook-ceph on server deployment/csi-cephfsplugin-provisioner deploys registry.k8s.io/sig-storage/csi-attacher:v4.1.0,registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1,registry.k8s.io/sig-storage/csi-resizer:v1.7.0,registry.k8s.io/sig-storage/csi-provisioner:v3.4.0,quay.io/cephcsi/cephcsi:v3.7.2 deployment/csi-rbdplugin-provisioner deploys registry.k8s.io/sig-storage/csi-provisioner:v3.4.0,registry.k8s.io/sig-storage/csi-resizer:v1.7.0,registry.k8s.io/sig-storage/csi-attacher:v4.1.0,registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1,quay.io/cephcsi/cephcsi:v3.7.2 deployment/rook-ceph-operator deploys rook/ceph:v1.10.10 daemonset/csi-rbdplugin manages registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0,quay.io/cephcsi/cephcsi:v3.7.2 daemonset/csi-cephfsplugin manages registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0,quay.io/cephcsi/cephcsi:v3.7.2 In project rook-ceph-external You have no services, deployment configs, or build configs. Also I do not have csi provisioner pod created as @egulatee have the situation. BR, |
@jasminstrkonjic can you please check https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/ helps you to identify the problem? if not, please provide the required csi logs for debugging this. |
Sorry missed this part. Check you have csi-deployment and daemonset created if not i think you might need to check the SCC associated with it, do |
@Madhu-1 thank you very much for your help. Now I have deleted both namespaces and all resources. BR, |
Everything should work as it is no need for it. |
Now I have consolidated everything to single namespace. Error creating: pods "csi-cephfsplugin-provisioner-5b6959d889-" is forbidden: unable to validate against any security context constraint: [spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] Error creating: pods "csi-rbdplugin-provisioner-d6df8c994-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] I have tried to give permissions as I gave it to operator deployment with no luck to successfully start them. |
@jasminstrkonjic can you check if you have created the required SCC for csidriver https://github.com/rook/rook/blob/master/deploy/examples/operator-openshift.yaml#L53-L97? if yes looks like some issue with existing SCC's in the cluster as openshift chooses the SCC based on the caps. |
Thank you this SCC solved the problem for csi pod to be created. failed to provision volume with StorageClass "ceph-rbd": error getting secret rook-csi-rbd-provisioner in namespace rook-ceph-external: secrets "rook-csi-rbd-provisioner" not found obvious some secrets have not been created... will try to fix it |
you need to change the secret namespace in storageclass you are only using |
Found the problem there was Fixed by deleting old and created new SC with valid namespace parameter. Thank you so much. |
@Madhu-1 have one last request to make this finally work. Cephfs however return permisson denied on client pod (i.e. postgres) MountVolume.MountDevice failed for volume "pvc-048e93f4-298f-4ba9-a5e8-996310d95793" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph xxxxxxxx:6789,xxxxxxxx:6789,xxxxxxxx:6789:/volumes/csi/csi-vol-ddcbbf39-9d8d-11ed-8436-0a580afe0d21/13e07e0e-f48e-48c6-93ae-68b8593e1a4c /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-048e93f4-298f-4ba9-a5e8-996310d95793/globalmount -o name=csi-cephfs-provisioner,secretfile=/tmp/csi/keys/keyfile-2824771535,mds_namespace=okdfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon 2023-01-26T15:28:05.375+0000 7fbd6ded7f40 -1 failed for service _ceph-mon._tcp mount error 13 = Permission denied Maybe you can give me some hint where to focus since I do not know what to look anymore. |
You should not use
|
Thank you again. Dynamic provisioning works like charm for both storage classes but only for a moment. Tried to change RWO, RWX on PVC but no luck... |
also here are events. every pod with cepfs storage class (as default storage class) is crashing with not much info... 6s Warning BackOff pod/mariadb-1-5gdv2 Back-off restarting failed container |
@jasminstrkonjic, please check the pod logs/previous logs to see why it's failing. |
This happens for RWO volume, and the error comes from kubernetes/OCP, not Rook. |
B
Problem is only with cephfs storageclass. If I instantiate for example postgres template. Pod is in error state after a couple crashloopback. PVC and PV are successfully dynamic provisioned and bounded. `[root@omgr01 ~]# oc debug postgresql-1-56mds The database cluster will be initialized with locale "C". Data page checksums are disabled. initdb: directory "/var/lib/pgsql/data/userdata" exists but is not empty The database cluster will be initialized with locale "C". Data page checksums are disabled. fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok Also if I try to manually create file or directory it works fine. I'm totally confused now. sh-4.4$ pwd Any help would be much appreciated. |
Permission denied is on every deployment pod that uses cephfs... I have tried with jenkins and jboss. |
I belive that issue is lot like this related with SELinux boolean "container_use_cephfs" Can't find proper note how to set this up on FedoraCoreOS for OKD... |
I can confirm that this was the solution for OKD 4.10 Fedora CoreOS 35 Need to add parameter to cephfs storageclass It works! Horay |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Is this a bug report or feature request?
Deviation from expected behavior:
PVC stuck in pending
Expected behavior:
PVC should be created?
How to reproduce it (minimal and precise):
I followed the instructions provided here: https://rook.io/docs/rook/v1.10/CRDs/Cluster/external-cluster/
Applied all all Kubernetes manifest via argocd. (crds, common, operator, common-external, cluster-external) yaml files
Manually Generate ceph cluster environment variables via shell script on source cluster
Manually Set environment variables and ran import-external cluster shell script.
Create a sample PVC using rbd SC, which remains pending. (I will attach logs)
Before I opened this issue, I tried the solutions for issue: New BlockPool / SC + Parallel RBD Volume Creation hangs and fails #8696:
1 - I ran rbd pool init rook_rbd_storage then recreated the PVC. No change, PVC remains pending.
2 - I tried creating the toolbox, but it doesn't seem to work with external clusters.
File(s) to submit:
cluster.yaml
, if necessaryUsing https://raw.githubusercontent.com/rook/rook/3bccf60c5fd853fb80ecd4d3e8e0d146aa7226a9/deploy/examples/cluster-external.yaml
Logs to submit:
kubectl_logs_ceph_operator.txt
kubectl_logs_rbdplugin-provisioner.txt
To get logs, use
kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the
insert code
button from the Github UI.Read GitHub documentation if you need help.
Cluster Status to submit:
kubectl_describe_cephcluster.txt
Output of krew commands, if necessary
`ceph status
cluster:
id: a896bcd1-6089-470f-a973-85f2fabe5149
health: HEALTH_OK
services:
mon: 5 daemons, quorum nuc9034,nuc9036,nuc9038,nuc9039,nuc9037 (age 45h)
mgr: nuc9035(active, since 22h), standbys: nuc9036, nuc9034
mds: 2/2 daemons up, 2 standby
osd: 8 osds: 8 up (since 2d), 8 in (since 2d)
data:
volumes: 2/2 healthy
pools: 7 pools, 169 pgs
objects: 14.12k objects, 54 GiB
usage: 161 GiB used, 20 TiB / 20 TiB avail
pgs: 169 active+clean
io:
client: 600 KiB/s wr, 0 op/s rd, 135 op/s wr
`
To get the health of the cluster, use
kubectl rook-ceph health
To get the status of the cluster, use
kubectl rook-ceph ceph status
For more details, see the Rook Krew Plugin
Environment:
PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
uname -a
):Linux k3s-node1 5.15.0-1021-kvm #26-Ubuntu SMP Tue Oct 25 18:39:10 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
rook version
inside of a Rook Pod):rook version rook: v1.10.6 go: go1.18.7
ceph -v
):Inside Rook Pod:
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
ProxMox Ceph Version:
16.2.9
kubectl version
):kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:17:11Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+k3s1", GitCommit:"f2585c1671b31b4b34bddbb3bf4e7d69662b0821", GitTreeState:"clean", BuildDate:"2022-10-25T19:59:38Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
'K3S on 7 Intel Nucs with dual NICs and VLANs'
ceph health
in the Rook Ceph toolbox):Unable to bring up toolbox.
The text was updated successfully, but these errors were encountered: