Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: add workaround for Parallel RBD PVC Creation hangs on new pools #8770

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 10 additions & 0 deletions Documentation/ceph-csi-troubleshooting.md
Expand Up @@ -441,3 +441,13 @@ $ rbd ls --id=csi-rbd-node -m=10.111.136.166:6789 --key=AQDpIQhg+v83EhAAgLboWIbl
```

Where `-m` is one of the mon endpoints and the `--key` is the key used by the CSI driver for accessing the Ceph cluster.

## Parallel RBD PVC creation hangs for new BlockPool

This issue is specifically present in CephCSI `v3.4.x`, used by rook `>=v1.7.1` and occurs when multiple parallel PVCs creation requests are issued on a newly created uninitialized blockpool. Follow the steps below to workaround the issue:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have a CSI release with the fix? I'm still not clear in what version of CSI this is or will be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have a CSI release with the fix?

@travisn
No, we need ceph pacific release with fix ceph/ceph#43113 to build a new cephcsi built on ceph pacific to resolve this issue.

I'm still not clear in what version of CSI this is or will be fixed.

This will be an issue in CSI v3.4.0 (and also v3.4.1 which will come out soon).

From ceph/ceph#43113 (comment) ,
We should be able to pick up the fix with next ceph pacific release in cephcsi v3.4.2 and make changes in the WA doc at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn @Rakshith-R instead of documenting how about new cephcsi 3.4.1 with octopus as the base image ceph/ceph-csi#2521 (comment). If we do that we don't need to document any workaround.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn @Rakshith-R instead of documenting how about new cephcsi 3.4.1 with octopus as the base image ceph/ceph-csi#2521 (comment). If we do that we don't need to document any workaround.

This is not an option since with ceph octopus deep_copy() does not work as expected, refer : ceph/ceph-csi#2521 (comment)

Or rook can call rbd pool init <pool_name> right after creation?
(IMO since it will be used for rbd images so no harm to initialize it too ?)

@idryomov @travisn Do you think its a more preferrable solution ?

cc @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds better instead of documentation 👍

Copy link
Member

@travisn travisn Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Yes, it just needs to be called once. If it is done right after pool create/ setting property, it will avoid this issue.

#8696 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Yes, it just needs to be called once. If it is done right after pool create/ setting property, it will avoid this issue.

#8696 (comment)

@Rakshith-R Are you planning on this change in Rook, or somebody else should pick it up?

* Execute `rbd pool init <pool_name>` command from [toolbox](./ceph-toolbox.md) or ceph-csi pods(similar to [this](#rbd-commands)).
* Restart the csi-rbdplugin-provisioner-xxx pods.

`kubectl -n rook-ceph delete pods -l app=csi-rbdplugin-provisioner`

After following the above steps, multiple parallel PVC creation will work as expected.