Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rpc call for PeerBlockPool #2457

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rewantsoni
Copy link
Member

@rewantsoni rewantsoni commented Feb 12, 2024

Add a new RPC call PeerBlockPool to enable one-way mirroring between the block pools.

The rpc call is initiated from the primary cluster which wants to mirror the pool, the server receives the request and does the following:

  1. Enable RBD Mirror Daemon
  2. Enable Mirroring on the Block Pool in the request
  3. Create the bootstrap secret and set the bootstrap secret ref on the Block Pool in the request.

Depends on:

  1. cephblockpool: allow updation of mirroring field from different components #2480

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 12, 2024
Copy link
Contributor

openshift-ci bot commented Feb 12, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rewantsoni
Once this PR has been reviewed and has the lgtm label, please assign jarrpa for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rewantsoni rewantsoni force-pushed the server-client branch 5 times, most recently from 6cb2451 to 201eadd Compare February 19, 2024 13:57
@rewantsoni rewantsoni force-pushed the server-client branch 7 times, most recently from c9c775f to a6c7525 Compare February 28, 2024 08:44
@rewantsoni rewantsoni changed the title Server client add rpc call for PeerBlockPool Feb 28, 2024
@rewantsoni rewantsoni mentioned this pull request Mar 4, 2024
Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rpc call is initiated from the primary cluster which wants to mirror the pool, the server receives the request and does the following

  • a bit incomplete (at-least for me), by above do you mean the server is running in "another" cluster (maybe secondary?) and primary cluster wants to send over the info to secondary cluster?
  • through out app lifecycle, does primary and secondary change identities and become the opposite? if yes, should we only use server and client terminology for all operations?

Let me know if you don't want review on draft PRs, I need to review this PR again as I'm not totally upto speed w/ RDR.

services/provider/client/client.go Outdated Show resolved Hide resolved
}, nil
}

func (c *cephRBDMirrorManager) Create(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason in not using creatorupdate from controller util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the case when we run MaintenanceMode (this requires the RBD mirror to be scaled down). If a new PeerBlockPool call is received while MaintenacneMode is in progress it and we use CreateOrUpdate, it will interfere with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, does the owner scale up after the maintenance mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the responsibility of the controller executing the maintenance mode

services/provider/server/cephblockpool.go Outdated Show resolved Hide resolved
@@ -690,3 +705,34 @@ func (s *OCSProviderServer) ReportStatus(ctx context.Context, req *pb.ReportStat

return &pb.ReportStatusResponse{}, nil
}

// PeerBlockPool RPC call to send the bootstrap secret for the pool
func (s *OCSProviderServer) PeerBlockPool(ctx context.Context, req *pb.PeerBlockPoolRequest) (*pb.PeerBlockPoolResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either make this RPC idempotent or ensure different failure for resource already exists requests.

services/provider/server/cephblockpool.go Outdated Show resolved Hide resolved
services/provider/server/cephblockpool.go Outdated Show resolved Hide resolved
services/provider/server/server.go Show resolved Hide resolved
// enable mirroring on blockPool in the req
if err := s.cephBlockPoolManager.EnableBlockPoolMirroring(ctx, req.BlockPoolName); err != nil {
if kerrors.IsNotFound(err) {
return nil, status.Errorf(codes.NotFound, "Failed to find CephBlockPool resource %s: %v", req.BlockPoolName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the errors are a bit misleading (same for remaining), basically the function asks for enabling mirroring and if you didn't find the source why are we at this line 🤔?

by source I mean, block pool, at the start why can't we return if we didn't find the blockpool? basically we are checking blockpool existence indirectly three times throughtout this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to return at the beginning of the function if the blockpool doesn't exist

@rewantsoni
Copy link
Member Author

rewantsoni commented Mar 4, 2024

  • a bit incomplete (at-least for me), by above do you mean the server is running in "another" cluster (maybe secondary?) and primary cluster wants to send over the info to secondary cluster?

Yes right, we will have two clusters, one running the server(secondary) and other cluster as a client who sends the request (StorageClusterPeer; primary). The primary cluster sends over the contents of the bootstrap secret (that is pool and token fields) along with the blockPool Name it wants to mirror.

  • through out app lifecycle, do primary and secondary change identities and become the opposite? if yes, should we only use server and client terminology for all operations?

We are implementing peering in one-way replication, for RDR both primary and secondary clusters will act as both a server and a client, to be more specific,
StorageClusterPeer on the Primary Cluster will be a client for the API Server on the Secondary Cluster and StorageClusterPeer on the Secondary Cluster will be a client for the API Server on Primary Cluster.

Let me know if you don't want review on draft PRs, I need to review this PR again as I'm not totally upto speed w/ RDR.

Yes, I would like reviews on the draft PRs I have tested the PR. I need to complete unit tests and uninstall flow (probably as a part of a different PR).

@rewantsoni rewantsoni marked this pull request as ready for review March 4, 2024 10:16
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2024
@rewantsoni rewantsoni force-pushed the server-client branch 5 times, most recently from 5b03275 to dd90597 Compare March 4, 2024 13:48
@iamniting
Copy link
Member

/test ocs-operator-bundle-e2e-aws

namespace string
}

func newCephBlockPoolManager(cl client.Client, namespace string) (*cephBlockPoolManager, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this task can be completed without any errors in the return type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the same function defination that was there for the other two managers.

services/provider/server/cephblockpool.go Outdated Show resolved Hide resolved
Comment on lines 93 to 94
blockPoolObj := &rookCephv1.CephBlockPool{}
err := c.client.Get(ctx, types.NamespacedName{Name: blockPoolName, Namespace: c.namespace}, blockPoolObj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use utils to construct the type?

Suggested change
blockPoolObj := &rookCephv1.CephBlockPool{}
err := c.client.Get(ctx, types.NamespacedName{Name: blockPoolName, Namespace: c.namespace}, blockPoolObj)
blockPool := &rookCephv1.CephBlockPool{}
blockPool.Name = blockPoolName
blockPool.Namespace = c.namespace
err := c.client.Get(ctx, client.ObjectKeyFromObject(blockPool), blockPool)

similar at remaining places, it just saves us from constructing a struct.

Comment on lines 62 to 66
err := ctrl.SetControllerReference(cephBlockPool, bootstrapSecret, c.client.Scheme())
if err != nil {
return err
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return ctrl.SetControllerReference....

@@ -690,3 +705,34 @@ func (s *OCSProviderServer) ReportStatus(ctx context.Context, req *pb.ReportStat

return &pb.ReportStatusResponse{}, nil
}

// PeerBlockPool RPC call to send the bootstrap secret for the pool
func (s *OCSProviderServer) PeerBlockPool(ctx context.Context, req *pb.PeerBlockPoolRequest) (*pb.PeerBlockPoolResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance that two clients call this rpc referring same block pool? if yes, the idea is to just fail one client & make it retry?

Copy link
Member Author

@rewantsoni rewantsoni Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes two clients can call this rpc referring to the same block pool, in that case it will fail one and make it retry

}, nil
}

func (c *cephRBDMirrorManager) Create(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, does the owner scale up after the maintenance mode?

Comment on lines 30 to 39
err := c.client.Get(ctx, types.NamespacedName{Name: rBDMirrorName, Namespace: c.namespace}, cephRBDMirrorObj)
if err == nil {
return nil
}

if err != nil && !kerrors.IsNotFound(err) {
return err
}

cephRBDMirrorObj.Name = rBDMirrorName
cephRBDMirrorObj.Namespace = c.namespace
cephRBDMirrorObj.Spec = rookCephv1.RBDMirroringSpec{Count: 1}

err = c.client.Create(ctx, cephRBDMirrorObj)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe always try to create and fail if error is other than exists? or do if !kerrors.IsNotFound(err) { create }

@leelavg
Copy link
Contributor

leelavg commented Mar 6, 2024

I would like reviews on the draft PRs

  • ack, I usually open all for reviews but if untested I put a /hold flag, just saying

can resolved my previous addressed comments, thanks.

@rewantsoni rewantsoni force-pushed the server-client branch 5 times, most recently from d861d4a to 08ac1ed Compare March 11, 2024 11:34
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 18, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2024
@leelavg
Copy link
Contributor

leelavg commented Apr 8, 2024

@rewantsoni this PR was marked as a dependent on which a comment was made until which invalidates the changes in this PR, let me know if this still need a review, thanks.

@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 17, 2024
Comment on lines 79 to 78
if kerrors.IsNotFound(err) {
return nil, fmt.Errorf("CephBlockPool resource %q not found: %v", blockPoolName, err)
}
return nil, fmt.Errorf("failed to get CephBlockPool resource with name %q: %v", blockPoolName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the result is different if the resource is not found here, as such don't make any differentiation as the returned error already contains the reason.

cephRBDMirror := &rookCephv1.CephRBDMirror{}
cephRBDMirror.Name = rBDMirrorName
cephRBDMirror.Namespace = c.namespace
err := c.client.Get(ctx, client.ObjectKeyFromObject(cephRBDMirror), cephRBDMirror)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason in making two calls rather than a single create?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need one rbdMirror for the cluster, hence I am checking if it already exists, and create it only if it doesn't

// enable mirroring on blockPool in the req
if err := s.cephBlockPoolManager.EnableBlockPoolMirroring(ctx, cephBlockPool); err != nil {
return nil, status.Errorf(codes.Internal, "Failed to enable mirroring for CephBlockPool resource %s: %v", req.Pool, err)
}

// create and set secret ref on the blockPool
if err := s.cephBlockPoolManager.SetBootstrapSecretRef(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's guaranteed that the call to SetBootstrap will fail first time since the resource version changes due to update in EnableBlock func. I believe we need to get fresh copy every time we want to update I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it again

Signed-off-by: Rewant Soni <resoni@redhat.com>
Signed-off-by: Rewant Soni <resoni@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@rewantsoni
Copy link
Member Author

/retest-required

Signed-off-by: Rewant Soni <resoni@redhat.com>
Signed-off-by: Rewant Soni <resoni@redhat.com>
Copy link
Contributor

openshift-ci bot commented May 14, 2024

@rewantsoni: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocs-operator-bundle-e2e-aws d520db2 link true /test ocs-operator-bundle-e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants