New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New BlockPool / SC + Parallel RBD Volume Creation hangs and fails #8696
Comments
The new CSI build once the upstream Ceph issue is merged will fix this problem. |
Thanks @DandyDeveloper! Particularly, ceph csi v3.4.0 (built with ceph pacific base image) and rook v1.7.1(which ships with cephcsi v3.4.0 as default) is affected by this issue. Updated WA (recommended, will not leave any stale omap entries)
see discussion here Another workaround without the need for rook toolbox:
The above steps will also resolve the deadlock. |
If you are going to use the toolbox pod, an even simpler workaround is |
Thank you, I hit this issue a week ago and didn't know how to solve it. |
@travisn we have 3 options here
cc @Rakshith-R |
This will remain an issue until ceph v16.2.7? Until then, let's pin this github issue so it's more visible. If possible, it would be nice to add this to the csi troubleshooting guide as well. |
I hope ceph/ceph#43113 will be part of 16.2.7. Sounds good to update the troubleshooting guide. @Rakshith-R can you please add this to the csi troubleshooting guide? |
Definitely. |
This commit adds workaround for Parallel RBD PVC Creation hangs on new pools in ceph-csi-troubleshooting.md. Refer: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds workaround for Parallel RBD PVC Creation hangs on new pools in ceph-csi-troubleshooting.md. Refer: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds workaround for Parallel RBD PVC Creation hangs on new pools in ceph-csi-troubleshooting.md. Refer: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds workaround for Parallel RBD PVC Creation hangs on new pools in ceph-csi-troubleshooting.md. Refer: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds workaround for Parallel RBD PVC Creation hangs on new pools in ceph-csi-troubleshooting.md. Refer: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: #8696 Signed-off-by: Rakshith R <rar@redhat.com> (cherry picked from commit ab87e1d)
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
This is done in order to prevent deadlock when parallel PVC create requests are issued on a new uninitialized rbd block pool due to https://tracker.ceph.com/issues/52537. Fixes: rook#8696 Signed-off-by: Rakshith R <rar@redhat.com>
Why is this issue closed? I still get this issue. |
With #8923 Rook is initializing the pool, so this same issue is not expected. What version of ceph and rook? Does it happen only at initial creation of the pool, or do you observe the hang later? |
I am on Rook 1.10.8
|
Is this a bug report or feature request?
Deviation from expected behavior:
After creating a new
CephBlockPool
and creating several PVCs in parallel on creation of that BlockPool / SC, thecsi-provisioner
hangs and results in nothing happening.This is a known upstream CSI issue with a fix here: ceph/ceph#43113
Expected behavior:
Parallel image creation should result in the PVCs being created and no problems are expected.
How to reproduce it (minimal and precise):
Create
CephBlockPool
+StorageClass
against the CSI provisioner.Scale 2
StatefulSet
resource relying onVolumeCLaimTemplates
Watch your PVCs sit in
Pending
whilst the provisioner struggles to process the requestsWorkaround
Fortunately, there's a relatively simple workaround to this problems;
rbd create <pool>/test --size 1G
kubectl delete pod -l app=csi-rbdplugin-provisioner -n rook-ceph
Discussion about the problem here: https://rook-io.slack.com/archives/CK9CF5H2R/p1631503838341700
Environment
NR
The text was updated successfully, but these errors were encountered: