Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When calling NodeStageVolume, a modprobe error occurs #4610

Open
gotoweb opened this issue May 9, 2024 · 4 comments
Open

When calling NodeStageVolume, a modprobe error occurs #4610

gotoweb opened this issue May 9, 2024 · 4 comments
Labels
component/cephfs Issues related to CephFS question Further information is requested

Comments

@gotoweb
Copy link

gotoweb commented May 9, 2024

Describe the bug

In response to a gRPC /csi.v1.Node/NodeStageVolume request from the csi plugin, the volume mount fails with the following error.

failed to mount volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

When I checked dmesg, I found the log Invalid ELF header magic: != \x7fELF
I hope this isn't a bug, but it seems to be out of my control.

Environment details

  • Image/version of Ceph CSI driver : docker image csi-node-driver-registrar:v2.9.3, cephcsi:canary
  • Helm chart version : N/A
  • Kernel version : 6.2.16-3-pve (proxmox)
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : kernel
  • Kubernetes cluster version : v1.29.4+k3s1
  • Ceph cluster version : 17.2.7

Steps to reproduce

  1. I deployed the csi plugin and cephFS driver using this manual as a reference.
  2. I created and deployed a storageclass that uses the cephfs.csi.ceph.com provisioner.
  3. I created a PVC that uses that storageclass.
  4. The provisioner works fine. All PVCs are bound.

Actual results

Pods attempting to mount the volume received the following error message from the kubelet

Warning  FailedMount  6m9s (x18 over 26m)  kubelet            MountVolume.MountDevice failed for volume "pvc-2c77ab9f-d45b-4dde-a548-b9db686aaf7a" : rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

I found the following message from the cephfs plugin pod.

I0509 14:27:18.135224    2769 utils.go:198] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC call: /csi.v1.Node/NodeStageVolume
I0509 14:27:18.135573    2769 utils.go:199] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/7a95dbced311aedf38c5f71b8028c278f7a6a48f70c6c2b3814b0465126aad76/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"1d88c854-9fa3-4806-b80f-5bbd29e03756","fsName":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1715255922910-1802-cephfs.csi.ceph.com","subvolumeName":"csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f","subvolumePath":"/volumes/k8svolgroup/csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f/b5aa4c29-2c53-44e1-a630-7638b5ab8a6b"},"volume_id":"0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f"}
I0509 14:27:18.140009    2769 omap.go:89] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f got omap values: (pool="cephfs.kubernetes.meta", namespace="csi", name="csi.volume.5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f"): map[csi.imagename:csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f csi.volname:pvc-f8743f50-09f3-4837-a52a-beec09bd58a2 csi.volume.owner:clickhouse]
I0509 14:27:18.467484    2769 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0509 14:27:18.468296    2769 nodeserver.go:313] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f cephfs: mounting volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f with Ceph kernel client
I0509 14:27:18.471796    2769 cephcmds.go:98] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f an error (exit status 1) occurred while running modprobe args: [ceph]
E0509 14:27:18.471865    2769 nodeserver.go:323] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f failed to mount volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
E0509 14:27:18.472265    2769 utils.go:203] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

The dmesg log looks like this:

Invalid ELF header magic: != \x7fELF

Expected behavior

Interestingly, it mounts successfully on other K8S clusters using the same Ceph cluster. I was able to check the logs from that cephfs plugin.

The storageclass and configmap (config.json) of the k8s cluster where the error occurs, and the k8s cluster that is working correctly, match completely.

I0509 14:34:08.422119    3880 utils.go:164] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC call: /csi.v1.Node/NodeStageVolume
I0509 14:34:08.422185    3880 utils.go:165] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"1d88c854-9fa3-4806-b80f-5bbd29e03756","fsName":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1715156287441-4003-cephfs.csi.ceph.com","subvolumeName":"csi-vol-578e1493-2c49-4534-bef0-efebdd508943","subvolumePath":"/volumes/k8svolgroup/csi-vol-578e1493-2c49-4534-bef0-efebdd508943/290fab63-facd-41fa-8eb5-c05173c0cae4"},"volume_id":"0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943"}
I0509 14:34:08.435295    3880 omap.go:88] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 got omap values: (pool="cephfs.kubernetes.meta", namespace="csi", name="csi.volume.578e1493-2c49-4534-bef0-efebdd508943"): map[csi.imagename:csi-vol-578e1493-2c49-4534-bef0-efebdd508943 csi.volname:pvc-d9aebf6f-bebb-463e-8919-4efc15f5ac6d csi.volume.owner:kafka]
I0509 14:34:08.438241    3880 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0509 14:34:08.438335    3880 nodeserver.go:312] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 cephfs: mounting volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 with Ceph kernel client
I0509 14:34:28.594111    3880 cephcmds.go:105] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 command succeeded: mount [-t ceph 192.168.123.63:6789,192.168.123.3:6789,192.168.123.101:6789:/volumes/k8svolgroup/csi-vol-578e1493-2c49-4534-bef0-efebdd508943/290fab63-facd-41fa-8eb5-c05173c0cae4 /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-1337678247,mds_namespace=kubernetes,_netdev]
I0509 14:34:28.594173    3880 nodeserver.go:252] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 cephfs: successfully mounted volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount
I0509 14:34:28.594227    3880 utils.go:171] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC response: {}
@Madhu-1
Copy link
Collaborator

Madhu-1 commented May 14, 2024

@gotoweb this might be cause of #4138?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented May 14, 2024

Are you able to load the ceph module manually from the node?

@gotoweb
Copy link
Author

gotoweb commented May 16, 2024

@Madhu-1 I don't think so. The volume mounts fine on another k8s cluster run as same version of kubelet.
I haven't tried to load the ceph module manually, I just use built-in ceph module on proxmox. I'll try to change version of driver/csi images...

@nixpanic
Copy link
Member

@gotoweb, the Ceph-CSI driver loads the kernel module that is provided by a host-path volume. If the module is already loaded (or built-in) , it should not try to load it again.

Commit ab87045 checks for the support of the cephfs filesystem, it is included in Ceph-CSI 3.11 and was backported to 3.10 with #4381.

@nixpanic nixpanic added question Further information is requested component/cephfs Issues related to CephFS labels May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants