Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: treat cluster as not existing if it is allowed to uninstall with volumes #10231

Closed
wants to merge 1 commit into from

Conversation

Rakshith-R
Copy link
Member

Description of your changes:

IsReadyToReconcile() used to return cluster as not existing
if cleanup policy was set to destroy and cluster marked for
deletion.
However, to allow proper deletion of corresponding PVCs,
deletion of blockpool, filesystem CRs need to wait for
rbd images and subvolumes to be deleted.
Hence, consider cluster as not existing only when
AllowUninstallWithVolumes is also set to true.

Signed-off-by: Rakshith R rar@redhat.com

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

…h volumes

IsReadyToReconcile() used to return cluster as not existing
if cleanup policy was set to destroy and cluster marked for
deletion.
However, to allow proper deletion of corresponding PVCs,
deletion of blockpool, filesystem CRs need to wait for
rbd images and subvolumes to be deleted.
Hence, consider cluster as not existing only when
`AllowUninstallWithVolumes` is also set to true.

Signed-off-by: Rakshith R <rar@redhat.com>
// If the cluster has a cleanup policy to destroy the cluster, it has been marked for deletion and it is allowed to uninstall
// with volumes, treat it as if it does not exist
if cephCluster.Spec.CleanupPolicy.HasDataDirCleanPolicy() && !cephCluster.DeletionTimestamp.IsZero() &&
cephCluster.Spec.CleanupPolicy.AllowUninstallWithVolumes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple issues with this approach if the policy is set to allowUninstallWithVolumes: false:

  • The removal is always blocked even if there no volumes remaining
  • All the controllers will be blocked from removal including object stores, which don't rely on volumes

So I think we need a different approach in the finalizer for each controller that is concerned about volumes existing. For example, in the pool controller it should block the removal of a pool if there are existing rbd images in that pool. Or in the file controller the removal should be blocked if there are subvolumes in that filesystem, unless allowUninstallWithVolumes: true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

IsReadyToReconcile() check comes before the dependent check that's being added at #9915 .
We should really not be deleting blockpool, cephfs and cephnfs CR without checking for images, subvolumes and exports respectively. This is especially because we have external k8s objects tied to these ceph objects.

I propose we add another bool for IsReadyToReconcile() which will still return the cluster when cleanup policy is set and cluster is marked for deletion for the above said CRs.

@BlaineEXE @travisn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowUninstallWithVolumes check is currently buggy, and I think it needs to be removed. The user can already remove CephBlockPool and CephFilesystem without checking volumes, which #9915 seeks to fix. The CephCluster will block on deletion until all CephBlockPools and CephFilesystems are deleted. This creates a problem where the only volume checking is done in the CephCluster, which is forced to be the last to be deleted. Thus, the allowUninstallWithVolumes setting no longer effectively blocks deletion, and we plan to just remove it once #9915 and the follow-up for RBD are finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

IsReadyToReconcile() check comes before the dependent check that's being added at #9915 .

I propose we add another bool for IsReadyToReconcile() which will still return the cluster when cleanup policy is set and cluster is marked for deletion for the above said CRs.

Again, the check you are adding will never be reached.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rakshith-R From Rook's behavior, this is by design. I don't see what Rook should change. To summarize the uninstall scenarios:

  • If a clean uninstall is desired, they should delete the file, pool, and other CRs first and wait for their finalizers to confirm they can be deleted. Then lastly, the CephCluster CR should be deleted to remove the cluster since all other CRs have been evaluated.
  • If a forced uninstall is desired, all the CRs can be deleted at the same time, and the finalizers will basically be ignored. Deleting the CephCluster CR is a destructive action that indicates it's intended to force the uninstall.

So if the non-forced uninstall is desired, the CephCluster CR must be deleted last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

OCS follows Rook implementation. CephCluster is deleted in the end and cleanup policy is used for other decision making.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

OCS follows Rook implementation. CephCluster is deleted in the end and cleanup policy is used for other decision making.

No, that behaviour was changed here red-hat-storage/ocs-operator#1563

This pr together with #9041 caused the issue.

cc @umangachapagain @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point the OCS operator sets the cleanup policy to destroy the cluster, it must be ok with the destruction of the cluster. If it was not intended to force deletion, sounds like red-hat-storage/ocs-operator#1563 needs to be revisited.

@travisn
Copy link
Member

travisn commented May 26, 2022

Per the discussion, I believe we can close this as by design in Rook.

@travisn travisn closed this May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ceph main ceph tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants