DR Orchestrator Framework Overview

The DR Orchestrator Framework automates the manual steps which are followed during the cross-region disaster recovery (DR) of your databases on AWS. It provides an approach with a centralized manifest-driven payload to trigger a single-click cross-region DR within the same account. The decision to perform a failover would still need human intervention and is out of scope for the automation.

With disaster recovery strategy for databases on AWS, you could implement your DR solution for AWS databases in your organization. This solution can be automated by enabling event driven architecture which will perform the failover of the database instances. This approach has benefits like reducing human errors due to manual coordination, providing the shortest possible RTO (recovery time objective) in some cases and facilitating repeatable testing of your DR solution. Without automation, in the event of an outage in your primary AWS region, resources in your organization would need to manually execute steps in a run-book which include multiple processes like promoting Amazon RDS Read Replicas in the secondary AWS Region to serve as the new primary. AWS provides boto3 APIs to these actions which can form the building blocks of an automated solution for your organizations needs.

Detailed documentation of DR Orchestrator Framework solution is available on Automate your DR solution for relational databases on AWS

1) Deploy DR Orchestrator Framework

DR Orchestrator Framework contains the resources like AWS Step Functions and AWS Lambda Functions which are needed to be deployed in order to Automate the Failover and Failback of AWS databases.

Follow the instructions mentioned below in order to deploy DR Orchestrator Framework:

Clone the git repository to your local directory

git clone https://github.com/aws-samples/aws-cross-region-dr-databases.git

Prerequisite

Create the archive files for Lambda Functions (e.g. lambda_function.zip or dependencies.zip)
- go to ../DR-Orchestration-artifacts and run the below script to do so:-
```
cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts
```
```
bash scripts/deploy-orchestrator-sh.sh
```
Create two S3 buckets in both Primary (us-east-1)as well in the Secondary (us-west-2) regions:
- dr-orchestrator-xxxxxx-us-east-1
- dr-orchestrator-xxxxxx-us-west-2
Note: Replace xxxxxx with the unique name. Update TemplateStoreS3BucketName value accordingly in the parameter files.
Create two subnets to hold Lambda Functions in your VPC in both Primary (us-east-1)as well in the Secondary (us-west-2) regions:
- subnet-XXXXXXX
- subnet-YYYYYYY
Create a Security Group in your VPC in both Primary (us-east-1)as well in the Secondary (us-west-2) regions:
- sg-XXXXXXXXXXXX
Upload DR-Orchestration-artifacts directory inlcluding all files to S3 buckets in both regions.
Ensure that you are using subnet Ids, Security Group, vpcId from the same VPC where you have deployed AWS databases (RDS, Aurora and ElastiCache). Otherwise you have to add the Subnet Ids in the Interface Endpoints created the DR Orchestrator.
Ensure AWS database are available in the same AWS region

1.1) Deploy DR Orchestrator resources in Primary Region (US-EAST-1)

Modify Orchestrator-Deployer-parameters-us-east-1.json file and update the following parameters as per you VPC:
- TemplateStoreS3BucketName

Go the DR-Orchestration-artifacts/cloudformation directory and run the below command:-

cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts/cloudformation

aws cloudformation deploy \
--region us-east-1 --stack-name dr-orchestrator \
--template-file Orchestrator-Deployer.yaml \
--parameter-overrides file://Orchestrator-Deployer-parameters-us-east-1.json \
--capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \
--disable-rollback

1.2) Deploy DR Orchestrator resources in Secondary Region (US-WEST-2)

US-WEST-2 (Secondary region)

Modify ../Orchestrator-Deployer-parameters-us-west-2.json file and update the following parameters as per you VPC:
- TemplateStoreS3BucketName

Go the ../DR-Orchestration-artifacts/cloudformation directory and run the below command:-

cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts/cloudformation

aws cloudformation deploy \
--region us-west-2 \
--stack-name dr-orchestrator \
--template-file Orchestrator-Deployer.yaml \
--parameter-overrides file://Orchestrator-Deployer-parameters-us-west-2.json \
--capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \
--disable-rollback

Testing

Test Case #1: Perform Switchover (managed planned failover) for Amazon Aurora global databases(MySQL)

By using switchover, you can change the Region of you primary cluster on a routine basis. This approach is intended for controlled scenarios, such as operational maintenance and other planned operational procedures. Refer https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-disaster-recovery.html for more information.

To test Planned Switchover or Failover , follow the steps given below:

Go to the DR region (let's say, us-west-2)
Start execution of dr-orchestrator-stepfunction-FAILOVER Step Function
Use below payload as Input

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "PlannedFailoverAurora",
          "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (MySQL)",
          "parameters": {
            "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-mysql-global-identifier",
            "DBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-cluster-identifier" 
          }
        }
      ]
    }
  ]
}

Note: In case Amazon Aurora global database was not deployed using the App-Stack CFT. You can provide the parameter's value directly as mentioned below:

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "PlannedFailoverAurora",
          "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (MySQL)",
          "parameters": {
            "GlobalClusterIdentifier": "dr-globaldb-cluster-mysql",
            "DBClusterIdentifier": "arn:aws:rds:us-west-2:xxxxxxxxxxx:cluster:dbcluster-02"
          }
        }
      ]
    }
  ]
}

Test Case #2: Perform manual unplanned failover ("detach and promote") for Amazon Aurora global databases(MySQL) - unplanned outage

On very rare occasions, your Aurora global database might experience an unexpected outage in its primary AWS Region. If this happens, your primary Aurora DB cluster and its writer node aren't available, and the replication between the primary and secondary DB clusters stops. Refer https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-disaster-recovery.html for more information.

To perform Manual unplanned failover, follow the steps given below:

Go to the DR region (let's say, us-west-2)
Start execution of dr-orchestrator-stepfunction-FAILOVER Step Function
Use below payload as Input

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "UnPlannedFailoverAurora",
          "resourceName": "Performing unplanned failover for Amazon Aurora global databases (MySQL)",
          "parameters": {
            "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-mysql-global-identifier",
            "DBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-cluster-identifier",
            "ClusterRegion": "!Import dr-globaldb-cluster-mysql-cluster-region"
          }
        }
      ]
    }
  ]
}

Test Case #3: Perform Failback [Create Secondary cluser] for Amazon Aurora global databases(MySQL)

After the original topology is restored post an very rare unplanned outage. You may want to create the Secondary cluster back in the original primary region.

To perform Create Secondary cluser, follow the steps given below:

Go to the original Primary Region (let's say, us-east-1)
Start execution of dr-orchestrator-stepfunction-FAILBACK Step Function
Use below payload as Input

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "CreateAuroraSecondaryDBCluster",
          "resourceName": "To create secondary Aurora MySQL Global Database Cluster",
          "parameters": {
            "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-mysql-global-identifier",
            "DBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-cluster-identifier",
            "DBClusterName": "!Import dr-globaldb-cluster-mysql-cluster-name",
            "SourceDBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-source-cluster-identifier",
            "DBInstanceIdentifier": "!Import dr-globaldb-cluster-mysql-instance-identifier",
            "Port": "!Import dr-globaldb-cluster-mysql-port",
            "DBInstanceClass": "!Import dr-globaldb-cluster-mysql-instance-class",
            "DBSubnetGroupName": "!Import dr-globaldb-cluster-mysql-subnet-group-name",
            "VpcSecurityGroupIds": "!Import dr-globaldb-cluster-mysql-vpc-security-group-ids",
            "Engine": "!Import dr-globaldb-cluster-mysql-engine",
            "EngineVersion": "!Import dr-globaldb-cluster-mysql-engine-version",
            "KmsKeyId": "!Import dr-globaldb-cluster-mysql-KmsKeyId",
            "SourceRegion": "!Import dr-globaldb-cluster-mysql-source-region",
            "ClusterRegion": "!Import dr-globaldb-cluster-mysql-cluster-region",
            "BackupRetentionPeriod": "7",
            "MonitoringInterval": "60",
            "StorageEncrypted": "True",
            "EnableIAMDatabaseAuthentication": "True",
            "DeletionProtection": "True",
            "CopyTagsToSnapshot": "True",
            "AutoMinorVersionUpgrade": "True",
            "MonitoringRoleArn": "!Import rds-mysql-instance-RDSMonitoringRole"
          }
        }
      ]
    }
  ]
}

Once the secondary cluster has created and if you want to fail back your global database to the original primary Region. You can perform a switchover operation when it makes the most sense for your business and workload, follow steps mentioned under Test Case #1

Note: Use below payload as Input for Amazon RDS Instance

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "CreateRDSReadReplica",
          "resourceName": "Create RDS for MySQL Read Replica",
          "parameters": {
            "RDSInstanceIdentifier": "!Import rds-mysql-instance-identifier",
            "TargetClusterIdentifier": "!Import rds-mysql-instance-global-arn",
            "SourceRDSInstanceIdentifier": "!Import rds-mysql-instance-source-identifier",
            "SourceRegion": "!Import rds-mysql-instance-SourceRegion",
            "MultiAZ": "!Import rds-mysql-instance-MultiAZ",
            "DBInstanceClass": "!Import rds-mysql-instance-DBInstanceClass",
            "DBSubnetGroup": "!Import rds-mysql-instance-DBSubnetGroup",
            "DBSecurityGroup": "!Import rds-mysql-instance-DBSecurityGroup",
            "KmsKeyId": "!Import rds-mysql-instance-KmsKeyId",
            "BackupRetentionPeriod": "7",
            "MonitoringInterval": "60",
            "StorageEncrypted": "True",
            "EnableIAMDatabaseAuthentication": "True",
            "DeletionProtection": "True",
            "CopyTagsToSnapshot": "True",
            "AutoMinorVersionUpgrade": "True",
            "Port": "!Import rds-mysql-instance-DBPortNumber",
            "MonitoringRoleArn": "!Import rds-mysql-instance-RDSMonitoringRole"
          }
        }
      ]
    }
  ]
}

Test Case #4: Perform cross-region failover for 1) Amazon RDS for MySQL Read Replica and 2) Amazon ElastiCache for Redis Cluster together

{
  "StatePayload": [
    {
      "layer": 1,
      "resources": [
        {
          "resourceType": "PromoteRDSReadReplica",
          "resourceName": "Promote RDS for MySQL Read Replica",
          "parameters": {
            "RDSInstanceIdentifier": "!Import rds-mysql-instance-identifier",
            "TargetClusterIdentifier": "!Import rds-mysql-instance-global-arn"
          }
        },
        {
          "resourceType": "FailoverElastiCacheCluster",
          "resourceName": "Failover ElastiCache Cluster",
          "parameters": {
            "GlobalReplicationGroupId": "!Import demo-redis-cluster-global-replication-group-id",
            "TargetRegion": "!Import demo-redis-cluster-target-region",
            "TargetReplicationGroupId": "!Import demo-redis-cluster-target-replication-group-id"
          }
        }
      ]
    }
  ]
}

Troubleshooting

Issue #1: Error while deploying DR Orchestrator

Resource handler returned message: "Error occurred while GetObject. S3 Error Code: NoSuchKey. S3 Error Message: The specified key does not exist. (Service: Lambda, Status Code: 400, Request ID: 3f96c885-ba1b-4453-96b9-a625d0a30221)" (RequestToken: a52bba65-ee8c-9488-cf6c-709717b9a279, HandlerErrorCode: InvalidRequest)

Solution: Make sure you have followed all the steps mentioned in the Prerequisite section

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
App-Stack		App-Stack
DR-Orchestration-artifacts		DR-Orchestration-artifacts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App-Stack

App-Stack

DR-Orchestration-artifacts

DR-Orchestration-artifacts

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

Repository files navigation

DR Orchestrator Framework Overview

1) Deploy DR Orchestrator Framework

Prerequisite

1.1) Deploy DR Orchestrator resources in Primary Region (US-EAST-1)

1.2) Deploy DR Orchestrator resources in Secondary Region (US-WEST-2)

Testing

Test Case #1: Perform Switchover (managed planned failover) for Amazon Aurora global databases(MySQL)

Test Case #2: Perform manual unplanned failover ("detach and promote") for Amazon Aurora global databases(MySQL) - unplanned outage

Test Case #3: Perform Failback [Create Secondary cluser] for Amazon Aurora global databases(MySQL)

Test Case #4: Perform cross-region failover for 1) Amazon RDS for MySQL Read Replica and 2) Amazon ElastiCache for Redis Cluster together

Troubleshooting

Issue #1: Error while deploying DR Orchestrator

Solution: Make sure you have followed all the steps mentioned in the Prerequisite section

Security

License

About

Releases

Packages

Contributors 2

Languages

License

aws-samples/aws-cross-region-dr-databases

Folders and files

Latest commit

History

Repository files navigation

DR Orchestrator Framework Overview

1) Deploy DR Orchestrator Framework

Prerequisite

1.1) Deploy DR Orchestrator resources in Primary Region (US-EAST-1)

1.2) Deploy DR Orchestrator resources in Secondary Region (US-WEST-2)

Testing

Test Case #1: Perform Switchover (managed planned failover) for Amazon Aurora global databases(MySQL)

Test Case #2: Perform manual unplanned failover ("detach and promote") for Amazon Aurora global databases(MySQL) - unplanned outage

Test Case #3: Perform Failback [Create Secondary cluser] for Amazon Aurora global databases(MySQL)

Test Case #4: Perform cross-region failover for 1) Amazon RDS for MySQL Read Replica and 2) Amazon ElastiCache for Redis Cluster together

Troubleshooting

Issue #1: Error while deploying DR Orchestrator

Solution: Make sure you have followed all the steps mentioned in the Prerequisite section

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages