diff --git a/Documentation/async-disaster-recovery.md b/Documentation/async-disaster-recovery.md new file mode 100644 index 0000000000000..5abffc006cfb6 --- /dev/null +++ b/Documentation/async-disaster-recovery.md @@ -0,0 +1,338 @@ +--- +title: Asynchronous Disaster Recovery Failover and Failback +weight: 3245 +indent: true +--- + +# RBD Asynchronous Disaster Recovery Failover and Failback + +## Table of Contents + +* [Planned Migration](#planned-migration) + * [Failover](#failover) + * [Failback](#failback) +* [Disaster Recovery](#disaster-recovery) + * [Failover](#failover-abrupt-shutdown) + * [Failback](#failback-post-disaster-recovery) +* [Appendix](#appendix) + * [Creating a VolumeReplicationClass CR](#create-a-volume-replication-class-cr) + * [Creating a VolumeReplications CR](#create-a-volumereplication-cr) + * [Check VolumeReplication CR status](async-disaster-recovery.md#checking-replication-status) + * [Backup and Restore](#backup-&-restore) + +## Planned Migration + +> Use cases: Datacenter maintenance, technology refresh, disaster avoidance, etc. + +### Failover + +The failover operation is the process of switching production to a + backup facility (normally your recovery site). In the case of Failover, + access to the image on the primary site should be stopped. +The image should now be made *primary* on the secondary cluster so that + the access can be resumed there. + +> :memo: As mentioned in the pre-requisites, periodic or one time backup of +> the application should be available for restore on the secondary site (cluster-2). + +Follow the below steps for planned migration of workload from primary + cluster to secondary cluster: + +* Scale down all the application pods which are using the + mirrored PVC on the Primary Cluster. +* [Take a backup](async-disaster-recovery.md#backup-&-restore) of PVC and PV object from the primary cluster. + This can be done using some backup tools like + [velero](https://velero.io/docs/main/). +* [Update VolumeReplication CR](async-disaster-recovery.md#create-a-volumereplication-cr) to set `replicationState` to `secondary` at the Primary Site. + When the operator sees this change, it will pass the information down to the + driver via GRPC request to mark the dataSource as `secondary`. +* If you are manually recreating the PVC and PV on the secondary cluster, + remove the `claimRef` section in the PV objects. (See [this](async-disaster-recovery.md#restore-the-backup-on-cluster-2) for details) +* Recreate the storageclass, PVC, and PV objects on the secondary site. +* As you are creating the static binding between PVC and PV, a new PV won’t + be created here, the PVC will get bind to the existing PV. +* [Create the VolumeReplicationClass](async-disaster-recovery.md#create-a-volume-replication-class-cr) on the secondary site. +* [Create VolumeReplications](async-disaster-recovery.md#create-a-volumereplication-cr) for all the PVC’s for which mirroring + is enabled + * `replicationState` should be `primary` for all the PVC’s on + the secondary site. +* [Check VolumeReplication CR status](async-disaster-recovery.md#checking-replication-status) to verify if the image is marked `primary` on the secondary site. +* Once the Image is marked as `primary`, the PVC is now ready + to be used. Now, we can scale up the applications to use the PVC. + +>:memo: **WARNING**: In Async Disaster recovery use case, we don't get +> the complete data. +> We will only get the crash-consistent data based on the snapshot interval time. + +### Failback + +A failback operation is a process of returning production to its + original location after a disaster or a scheduled maintenance period. + For a migration during steady state operation, a failback uses the + same process as failover by just switching the clusters. + +>:memo: **Remember**: We can skip the backup-restore operations +> in case of failback if the required yamls are already present on +> the primary cluster. Any new PVCs will still need to be restored on the +> primary site. + +## Disaster Recovery + +> Use cases: Natural disasters, Power failures, System failures, and crashes, etc. + +### Failover (abrupt shutdown) + +In case of Disaster recovery, create VolumeReplication CR at the Secondary Site. + Since the connection to the Primary Site is lost, the operator automatically + sends a GRPC request down to the driver to forcefully mark the dataSource as `primary` + on the Secondary Site. + +* If you are manually creating the PVC and PV on the secondary cluster, remove + the claimRef section in the PV objects. +* Create the storageclass, PVC, and PV objects on the secondary site. +* As you are creating the static binding between PVC and PV, a new PV won’t be + created here, the PVC will get bind to the existing PV. +* [Create the VolumeReplicationClass](async-disaster-recovery.md#create-a-volume-replication-class-cr) and [VolumeReplication CR](async-disaster-recovery.md#create-a-volumereplication-cr) on the secondary site. +* [Check VolumeReplication CR status](async-disaster-recovery.md#checking-replication-status) to verify if the image is marked `primary` on the secondary site. +* Once the Image is marked as `primary`, the PVC is now ready to be used. Now, + we can scale up the applications to use the PVC. + +### Failback (post-disaster recovery) + +Once the failed cluster is recovered on the primary site and you want to failback + from secondary site, follow the below steps: + +* Scale down the running applications(if any) on the primary site. + Ensure that all persistent volumes in use by the workload are no + longer in use on the primary cluster. +* [Update VolumeReplication CR](async-disaster-recovery.md#create-a-volumereplication-cr) replicationState + from `primary` to `secondary` on the primary site. +* Scale down the applications on the secondary site. +* [Update VolumeReplication CR](async-disaster-recovery.md#create-a-volumereplication-cr) replicationState state from `primary` to + `secondary` in secondary site. +* On the primary site, [verify the VolumeReplication status](async-disaster-recovery.md#checking-replication-status) is marked as + volume ready to use. +* Once the volume is marked to ready to use, change the replicationState state + from `secondary` to `primary` in primary site. +* Scale up the applications again on the primary site. + +## Appendix + +Below guide assumes that we have a PVC (rbd-pvc) in BOUND state; created using + *StorageClass* with `Retain` reclaimPolicy. + +```bash +kubectl get pvc --context=cluster-1 +``` + +> +> ```bash +> NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +> rbd-pvc Bound pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec 1Gi RWO csi-rbd-sc 44s +> ``` + +### Create a Volume Replication Class CR + +In this case, we create a Volume Replication Class on cluster-1 () + +```yaml +cat <:bulb: **Note:** The `schedulingInterval` can be specified in formats of +> minutes, hours or days using suffix `m`,`h` and `d` respectively. +> The optional schedulingStartTime can be specified using the ISO 8601 +> time format. + +### Create a VolumeReplication CR + +* Once VolumeReplicationClass is created, create a Volume Replication for + the PVC which we intend to replicate to secondary cluster. + +```yaml +cat <:memo: *VolumeReplication* is a namespace scoped object. Thus, +> it should be created in the same namespace as of PVC. + +### Checking Replication Status + +`replicationState` is the state of the volume being referenced. + Possible values are primary, secondary, and resync. + +* `primary` denotes that the volume is primary. +* `secondary` denotes that the volume is secondary. +* `resync` denotes that the volume needs to be resynced. + +To check VolumeReplication CR status: + +```bash +kubectl get volumereplication pvc-volumereplication --context=cluster-1 -oyaml +``` + +>```yaml +>... +>spec: +> dataSource: +> apiGroup: "" +> kind: PersistentVolumeClaim +> name: rbd-pvc +> replicationState: primary +> volumeReplicationClass: rbd-volumereplicationclass +>status: +> conditions: +> - lastTransitionTime: "2021-05-04T07:39:00Z" +> message: "" +> observedGeneration: 1 +> reason: Promoted +> status: "True" +> type: Completed +> - lastTransitionTime: "2021-05-04T07:39:00Z" +> message: "" +> observedGeneration: 1 +> reason: Healthy +> status: "False" +> type: Degraded +> - lastTransitionTime: "2021-05-04T07:39:00Z" +> message: "" +> observedGeneration: 1 +> reason: NotResyncing +> status: "False" +> type: Resyncing +> lastCompletionTime: "2021-05-04T07:39:00Z" +> lastStartTime: "2021-05-04T07:38:59Z" +> message: volume is marked primary +> observedGeneration: 1 +> state: Primary +>``` + +### Backup & Restore + +Here, we take a backup of PVC and PV object on one site, so that they can be restored later to the peer cluster. + +#### **Take backup on cluster-1** + +* Take backup of the PVC `rbd-pvc` + +```bash +kubectl --context=cluster-1 get pvc rbd-pvc -oyaml > pvc-backup.yaml +``` + +* Take a backup of the PV, corresponding to the PVC + +```bash +kubectl --context=cluster-1 get pv/pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec -oyaml > pv_backup.yaml +``` + +>:bulb: We can also take backup using external tools like **Velero**. +> Refer [velero documentation](https://velero.io/docs/main/) for more information. + +#### **Restore the backup on cluster-2** + +* Create storageclass on the secondary cluster + +```bash +kubectl create -f examples/rbd/storageclass.yaml --context=cluster-2 +``` + +> ```bash +> storageclass.storage.k8s.io/csi-rbd-sc created +> ``` + +* Create VolumeReplicationClass on the secondary cluster + +```bash +cat < ```bash +> volumereplicationclass.replication.storage.openshift.io/rbd-volumereplicationclass created +> ``` + +* If Persistent Volumes and Claims are created manually on the secondary cluster, + remove the `claimRef` on the backed up PV objects in yaml files; so that the + PV can get bound to the new claim on the secondary cluster. + +```yaml +... +spec: + accessModes: + - ReadWriteOnce + capacity: + storage: 1Gi + claimRef: + apiVersion: v1 + kind: PersistentVolumeClaim + name: rbd-pvc + namespace: default + resourceVersion: "64252" + uid: 65dc0aac-5e15-4474-90f4-7a3532c621ec + csi: +... +``` + +* Apply the Persistent Volume backup from the primary cluster + +```bash +kubectl create -f pv-backup.yaml --context=cluster-2 +``` + +> ```bash +> persistentvolume/pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec created +> ``` + +* Apply the Persistent Volume claim from the restored backup + +```bash +kubectl create -f pvc-backup.yaml --context=cluster-2 +``` + +> ```bash +> persistentvolumeclaim/rbd-pvc created +> ``` + +```bash +kubectl get pvc --context=cluster-2 +``` + +> ```bash +> NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +> rbd-pvc Bound pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec 1Gi RWO rook-ceph-block 44s +> ``` diff --git a/Documentation/planned-migration-and-dr.md b/Documentation/planned-migration-and-dr.md new file mode 100644 index 0000000000000..b531dbf5aa15e --- /dev/null +++ b/Documentation/planned-migration-and-dr.md @@ -0,0 +1,13 @@ +--- +title: Disaster Recovery Overview +weight: 3240 +--- + +# Planned Migration and Disaster Recovery with Rook + +Rook v1.6.0 comes with the new volume replication support and Ceph-CSI v3.3.0, which allows users to perform disaster recovery and planned migration of clusters. + +The following documents will help to configure the clusters, as well as track the procedure for failover and failback in case of a Disaster recovery or Planned migration use cases: + +* [Configuring clusters with DR](rbd-mirroring.md): A pod from which you can run all of the tools to troubleshoot the storage cluster +* [Async DR Failover and Failback Steps](async-disaster-recovery.md): Common issues and their potential solutions diff --git a/Documentation/rbd-mirroring.md b/Documentation/rbd-mirroring.md new file mode 100644 index 0000000000000..ed440cdcac58a --- /dev/null +++ b/Documentation/rbd-mirroring.md @@ -0,0 +1,247 @@ +--- +title: RBD Mirroring +weight: 3242 +indent: true +--- + +# RBD Mirroring + +[RBD mirroring](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/) + is an asynchronous replication of RBD images between multiple Ceph clusters. + This capability is available in two modes: + +* Journal-based: Every write to the RBD image is first recorded + to the associated journal before modifying the actual image. + The remote cluster will read from this associated journal and + replay the updates to its local image. +* Snapshot-based: This mode uses periodically scheduled or + manually created RBD image mirror-snapshots to replicate + crash-consistent RBD images between clusters. + +## Table of Contents + +* [Create RBD Pools](#create-rbd-pools) +* [Bootstrap Peers](#bootstrap-peers) +* [Configure the RBDMirror Daemon](#configure-the-rbdmirror-daemon) +* [Enable CSI Replication Sidecars](#enable-csi-replication-sidecars) +* [Volume Replication Custom Resources](#volume-replication-custom-resources) + +## Create RBD Pools + +In this section we create specific rbd pools that are RBD mirroring + enabled for use with the DR use case. + +> :memo: **Note:** It is also feasible to edit existing pools and +> enable them for replication. + +Execute the following steps on each peer cluster to create mirror + enabled pools: + +* Create a RBD pool that is enabled for mirroring by adding the section + `spec.mirroring` in the CephBlockPool CR: + +```yaml +apiVersion: ceph.rook.io/v1 +kind: CephBlockPool +metadata: + name: mirroredpool + namespace: rook-ceph +spec: + replicated: + size: 1 + mirroring: + enabled: true + mode: image + # schedule(s) of snapshot + snapshotSchedules: + - interval: 24h # daily snapshots + startTime: 14:00:00-05:00 +``` + +```bash +kubectl create -f pool-mirrored.yaml +``` + +> ```bash +> cephblockpool.ceph.rook.io/mirroredpool created +> ``` + +* Repeat the steps on the peer cluster. + +> :memo: WARNING: Pool name across the cluster peers should be the same +> for RBD replication to function. + +For more information on CephBlockPool CR, please refer to the [ceph-pool-crd documentation](Documentation/ceph-pool-crd.md#mirroring). + +## Bootstrap Peers + +In order for the rbd-mirror daemon to discover its peer cluster, the + peer must be registered and a user account must be created. + +The following steps enable bootstrapping peers to discover and + authenticate to each other: + +* For Bootstrapping a peer cluster its bootstrap secret is required. To determine the name of the secret that contains the bootstrap secret execute the following command on the remote cluster (cluster-2) + +```bash +kubectl get cephblockpool.ceph.rook.io/mirroredpool -n rook-ceph --context=cluster-2 -ojsonpath='{.status.info.rbdMirrorBootstrapPeerSecretName}' +``` + +> ```bash +> pool-peer-token-mirroredpool +> ``` + +Here, `pool-peer-token-mirroredpool` is the desired bootstrap secret name. + +* The secret pool-peer-token-mirroredpool contains all the information related to the token and needs to be injected to the peer, to fetch the decoded secret: + +```bash +kubectl get secret -n rook-ceph pool-peer-token-mirroredpool --context=cluster-2 -o jsonpath='{.data.token}'|base64 -d +``` + +> ```bash +>eyJmc2lkIjoiNGQ1YmNiNDAtNDY3YS00OWVkLThjMGEtOWVhOGJkNDY2OTE3IiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFDZ3hmZGdxN013R0JBQWZzcUtCaGpZVjJUZDRxVzJYQm5kemc9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMzkuMzY6MzMwMCx2MToxOTIuMTY4LjM5LjM2OjY3ODldIn0= +> ``` + +* Get site name from secondary cluster(cluster-2) + +```bash +kubectl get cephblockpools.ceph.rook.io mirroredpool -nrook-ceph --context=cluster-2 -o jsonpath='{.status.mirroringInfo.site_name}' +``` + +> ```bash +> 5a91d009-9e8b-46af-b311-c51aff3a7b49 +> ``` + +* With this Decoded value, create a secret on the primary site(cluster-1), using the site name of the peer as the name. + +```bash +kubectl -n rook-ceph create secret generic --context=cluster-1 5a91d009-9e8b-46af-b311-c51aff3a7b49 --from-literal=token=eyJmc2lkIjoiNGQ1YmNiNDAtNDY3YS00OWVkLThjMGEtOWVhOGJkNDY2OTE3IiwiY2xpZW50X2lkIjoicmJkLW1pcnJvci1wZWVyIiwia2V5IjoiQVFDZ3hmZGdxN013R0JBQWZzcUtCaGpZVjJUZDRxVzJYQm5kemc9PSIsIm1vbl9ob3N0IjoiW3YyOjE5Mi4xNjguMzkuMzY6MzMwMCx2MToxOTIuMTY4LjM5LjM2OjY3ODldIn0= --from-literal=pool=mirroredpool +``` + +> ```bash +> secret/5a91d009-9e8b-46af-b311-c51aff3a7b49 created +> ``` + +* This completes the bootstrap process for cluster-1 to be peered with cluster-2 +* Repeat the process switching cluster-2 in place of cluster-1, to complete the bootstrap process across both peer clusters. + +For more details, refer to the official rbd mirror documentation on + [how to create a bootstrap peer](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/#bootstrap-peers). + +## Configure the RBDMirror Daemon + +Replication is handled by the rbd-mirror daemon. The rbd-mirror daemon + is responsible for pulling image updates from the remote, peer cluster, + and applying them to image within the local cluster. + +Creation of the rbd-mirror daemon(s) is done through the custom resource + definitions (CRDs), as follows: + +* Create mirror.yaml, to deploy the rbd-mirror daemon + +```yaml +apiVersion: ceph.rook.io/v1 +kind: CephRBDMirror +metadata: + name: my-rbd-mirror + namespace: openshift-storage +spec: + # the number of rbd-mirror daemons to deploy + count: 1 + peers: + secretNames: + # list of Kubernetes Secrets containing the peer token + - "5a91d009-9e8b-46af-b311-c51aff3a7b49" +``` + +* Create the RBD mirror daemon + +```bash +kubectl create -f mirror.yaml -n rook-ceph --context=cluster-1 +``` + +> ```bash +> cephrbdmirror.ceph.rook.io/my-rbd-mirror created +> ``` + +* Validate if `rbd-mirror` daemon pod is now up + +```bash +kubectl get pods -n rook-ceph --context=cluster-1 +``` + +> ```bash +> rook-ceph-rbd-mirror-a-6985b47c8c-dpv4k 1/1 Running 0 10s +> ``` + +* Verify that daemon health is OK + +```bash +kubectl get cephblockpools.ceph.rook.io mirroredpool -nrook-ceph -o jsonpath='{.status.mirroringStatus.summary}' +``` + +> ```bash +> {"daemon_health":"OK","health":"OK","image_health":"OK","states":{"replaying":1}} +> ``` + +* Repeat the above steps on the peer cluster. + + For more information on how to set up Ceph RBDMirror CRD, refer to + [rook documentation](https://rook.io/docs/rook/master/ceph-rbd-mirror-crd.html). + +## Enable CSI Replication Sidecars + +To achieve RBD Mirroring, `csi-omap-generator` and `volume-replication` + containers need to be deployed in the RBD provisioner pods. + +* **Omap Generator**: Omap generator is a sidecar container that when + deployed with the CSI provisioner pod, generates the internal CSI + omaps between the PV and the RBD image. This is required as static PVs are + transferred across peer clusters in the DR use case, and hence + is needed to preserve PVC to storage mappings. + +* **Volume Replication Operator**: Volume Replication Operator is a + kubernetes operator that provides common and reusable APIs for + storage disaster recovery. + It is based on [csi-addons/spec](https://github.com/csi-addons/spec) + specification and can be used by any storage provider. + For more details, refer to [volume replication operator](https://github.com/csi-addons/volume-replication-operator). + +Execute the following steps on each peer cluster to enable the + OMap generator and Volume Replication sidecars: + +* Edit the `rook-ceph-operator-config` configmap and add the + following configurations + +```bash +kubectl edit cm rook-ceph-operator-config -n rook-ceph +``` + +Add the following configuration if not present: + +```yaml +data: + CSI_ENABLE_OMAP_GENERATOR: "true" + CSI_ENABLE_VOLUME_REPLICATION: "true" +``` + +* After updating the configmap with those settings, two new sidecars + should now start automatically in the CSI provisioner pod. +* Repeat the steps on the peer cluster. + +## Volume Replication Custom Resources + +Volume Replication Operator follows controller pattern and provides extended +APIs for storage disaster recovery. The extended APIs are provided via Custom +Resource Definition (CRD). It provides support for two custom resources: + +* **VolumeReplicationClass**: *VolumeReplicationClass* is a cluster scoped +resource that contains driver related configuration parameters. It holds +the storage admin information required for the volume replication operator. + +* **VolumeReplication**: *VolumeReplication* is a namespaced resource that contains references to storage object to be replicated and VolumeReplicationClass +corresponding to the driver providing replication. + +>:bulb: For more information, please refer to the +> [volume-replication-operator](https://github.com/csi-addons/volume-replication-operator). diff --git a/cluster/examples/kubernetes/ceph/pool-mirrored.yaml b/cluster/examples/kubernetes/ceph/pool-mirrored.yaml new file mode 100644 index 0000000000000..fce91ab5d3c04 --- /dev/null +++ b/cluster/examples/kubernetes/ceph/pool-mirrored.yaml @@ -0,0 +1,20 @@ +################################################################################################################# +# Create a mirroring enabled Ceph pool. Only a single OSD is required. +# kubectl create -f pool-mirrored.yaml +################################################################################################################# + +apiVersion: ceph.rook.io/v1 +kind: CephBlockPool +metadata: + name: mirroredpool + namespace: rook-ceph +spec: + replicated: + size: 1 + mirroring: + enabled: true + mode: image + # schedule(s) of snapshot + snapshotSchedules: + - interval: 24h # daily snapshots + startTime: 14:00:00-05:00