Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: add workaround for Parallel RBD PVC Creation hangs on new pools #8770

Closed
wants to merge 1 commit into from

Conversation

Rakshith-R
Copy link
Member

@Rakshith-R Rakshith-R commented Sep 21, 2021

This commit adds workaround for Parallel RBD PVC Creation hangs on
new pools in ceph-csi-troubleshooting.md.
Refer: #8696

Signed-off-by: Rakshith R rar@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

Documentation/ceph-csi-troubleshooting.md Outdated Show resolved Hide resolved
Documentation/ceph-csi-troubleshooting.md Outdated Show resolved Hide resolved
Documentation/ceph-csi-troubleshooting.md Outdated Show resolved Hide resolved
Documentation/ceph-csi-troubleshooting.md Outdated Show resolved Hide resolved
Documentation/ceph-csi-troubleshooting.md Outdated Show resolved Hide resolved
@Madhu-1
Copy link
Member

Madhu-1 commented Sep 22, 2021

@idryomov PTAL

This commit adds workaround for Parallel RBD PVC Creation hangs on
new pools in ceph-csi-troubleshooting.md.

Refer: rook#8696

Signed-off-by: Rakshith R <rar@redhat.com>
@Rakshith-R
Copy link
Member Author

updated docs with WA which does not leave any stale resources as discussed here ceph/ceph-csi#2521 (comment). PTAL
thanks!


## Parallel RBD PVC creation hangs for new BlockPool

This issue is specifically present in CephCSI `v3.4.x`, used by rook `>=v1.7.1` and occurs when multiple parallel PVCs creation requests are issued on a newly created uninitialized blockpool. Follow the steps below to workaround the issue:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have a CSI release with the fix? I'm still not clear in what version of CSI this is or will be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have a CSI release with the fix?

@travisn
No, we need ceph pacific release with fix ceph/ceph#43113 to build a new cephcsi built on ceph pacific to resolve this issue.

I'm still not clear in what version of CSI this is or will be fixed.

This will be an issue in CSI v3.4.0 (and also v3.4.1 which will come out soon).

From ceph/ceph#43113 (comment) ,
We should be able to pick up the fix with next ceph pacific release in cephcsi v3.4.2 and make changes in the WA doc at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn @Rakshith-R instead of documenting how about new cephcsi 3.4.1 with octopus as the base image ceph/ceph-csi#2521 (comment). If we do that we don't need to document any workaround.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn @Rakshith-R instead of documenting how about new cephcsi 3.4.1 with octopus as the base image ceph/ceph-csi#2521 (comment). If we do that we don't need to document any workaround.

This is not an option since with ceph octopus deep_copy() does not work as expected, refer : ceph/ceph-csi#2521 (comment)

Or rook can call rbd pool init <pool_name> right after creation?
(IMO since it will be used for rbd images so no harm to initialize it too ?)

@idryomov @travisn Do you think its a more preferrable solution ?

cc @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds better instead of documentation 👍

Copy link
Member

@travisn travisn Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Yes, it just needs to be called once. If it is done right after pool create/ setting property, it will avoid this issue.

#8696 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can avoid this issue if Rook always calls rbd pool init <pool_name> immediately after pool creation? And we would just do this upon creation of CephBlockPool CRs?

Yes, it just needs to be called once. If it is done right after pool create/ setting property, it will avoid this issue.

#8696 (comment)

@Rakshith-R Are you planning on this change in Rook, or somebody else should pick it up?

@mergify
Copy link

mergify bot commented Oct 5, 2021

This pull request has merge conflicts that must be resolved before it can be merged. @Rakshith-R please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

@Rakshith-R
Copy link
Member Author

Closing this pr, since #8923 is merged and should fix the issue.

@Rakshith-R Rakshith-R closed this Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants