Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: treat cluster as not existing if it is allowed to uninstall with volumes #10231

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 4 additions & 2 deletions pkg/operator/ceph/controller/controller_utils.go
Expand Up @@ -137,8 +137,10 @@ func IsReadyToReconcile(ctx context.Context, c client.Client, namespacedName typ
}
cephCluster = clusterList.Items[0]

// If the cluster has a cleanup policy to destroy the cluster and it has been marked for deletion, treat it as if it does not exist
if cephCluster.Spec.CleanupPolicy.HasDataDirCleanPolicy() && !cephCluster.DeletionTimestamp.IsZero() {
// If the cluster has a cleanup policy to destroy the cluster, it has been marked for deletion and it is allowed to uninstall
// with volumes, treat it as if it does not exist
if cephCluster.Spec.CleanupPolicy.HasDataDirCleanPolicy() && !cephCluster.DeletionTimestamp.IsZero() &&
cephCluster.Spec.CleanupPolicy.AllowUninstallWithVolumes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple issues with this approach if the policy is set to allowUninstallWithVolumes: false:

  • The removal is always blocked even if there no volumes remaining
  • All the controllers will be blocked from removal including object stores, which don't rely on volumes

So I think we need a different approach in the finalizer for each controller that is concerned about volumes existing. For example, in the pool controller it should block the removal of a pool if there are existing rbd images in that pool. Or in the file controller the removal should be blocked if there are subvolumes in that filesystem, unless allowUninstallWithVolumes: true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

IsReadyToReconcile() check comes before the dependent check that's being added at #9915 .
We should really not be deleting blockpool, cephfs and cephnfs CR without checking for images, subvolumes and exports respectively. This is especially because we have external k8s objects tied to these ceph objects.

I propose we add another bool for IsReadyToReconcile() which will still return the cluster when cleanup policy is set and cluster is marked for deletion for the above said CRs.

@BlaineEXE @travisn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowUninstallWithVolumes check is currently buggy, and I think it needs to be removed. The user can already remove CephBlockPool and CephFilesystem without checking volumes, which #9915 seeks to fix. The CephCluster will block on deletion until all CephBlockPools and CephFilesystems are deleted. This creates a problem where the only volume checking is done in the CephCluster, which is forced to be the last to be deleted. Thus, the allowUninstallWithVolumes setting no longer effectively blocks deletion, and we plan to just remove it once #9915 and the follow-up for RBD are finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Also, I am changing how some of this works with #9915, and I will have a follow-up for block pools. Travis and I talked about removing the allowUninstallWithVolumes entirely since it will no longer make sense with the dependency scheme.

IsReadyToReconcile() check comes before the dependent check that's being added at #9915 .

I propose we add another bool for IsReadyToReconcile() which will still return the cluster when cleanup policy is set and cluster is marked for deletion for the above said CRs.

Again, the check you are adding will never be reached.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rakshith-R From Rook's behavior, this is by design. I don't see what Rook should change. To summarize the uninstall scenarios:

  • If a clean uninstall is desired, they should delete the file, pool, and other CRs first and wait for their finalizers to confirm they can be deleted. Then lastly, the CephCluster CR should be deleted to remove the cluster since all other CRs have been evaluated.
  • If a forced uninstall is desired, all the CRs can be deleted at the same time, and the finalizers will basically be ignored. Deleting the CephCluster CR is a destructive action that indicates it's intended to force the uninstall.

So if the non-forced uninstall is desired, the CephCluster CR must be deleted last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

OCS follows Rook implementation. CephCluster is deleted in the end and cleanup policy is used for other decision making.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangachapagain @Rakshith-R is this the behavior in ocs-operator today for clean uninstall?

OCS follows Rook implementation. CephCluster is deleted in the end and cleanup policy is used for other decision making.

No, that behaviour was changed here red-hat-storage/ocs-operator#1563

This pr together with #9041 caused the issue.

cc @umangachapagain @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point the OCS operator sets the cleanup policy to destroy the cluster, it must be ok with the destruction of the cluster. If it was not intended to force deletion, sounds like red-hat-storage/ocs-operator#1563 needs to be revisited.

logger.Infof("%q: CephCluster has a destructive cleanup policy, allowing %q to be deleted", controllerName, namespacedName)
return cephCluster, false, cephClusterExists, WaitForRequeueIfCephClusterNotReady
}
Expand Down
21 changes: 21 additions & 0 deletions pkg/operator/ceph/controller/controller_utils_test.go
Expand Up @@ -173,6 +173,27 @@ func TestIsReadyToReconcile(t *testing.T) {
c, ready, clusterExists, _ := IsReadyToReconcile(ctx.TODO(), client, clusterName, controllerName)
assert.NotNil(t, c)
assert.False(t, ready)
assert.True(t, clusterExists)
})

t.Run("cephcluster with cleanup policy when deleted and allowed to uninstall with volumes", func(t *testing.T) {
cephCluster := &cephv1.CephCluster{
ObjectMeta: metav1.ObjectMeta{
Name: clusterName.Name,
Namespace: clusterName.Namespace,
DeletionTimestamp: &metav1.Time{Time: time.Now()},
},
Spec: cephv1.ClusterSpec{
CleanupPolicy: cephv1.CleanupPolicySpec{
Confirmation: cephv1.DeleteDataDirOnHostsConfirmation,
AllowUninstallWithVolumes: true,
},
}}
objects := []runtime.Object{cephCluster}
client := fake.NewClientBuilder().WithScheme(scheme).WithRuntimeObjects(objects...).Build()
c, ready, clusterExists, _ := IsReadyToReconcile(ctx.TODO(), client, clusterName, controllerName)
assert.NotNil(t, c)
assert.False(t, ready)
assert.False(t, clusterExists)
})
}