Amazon RDS for SQL Server makes it easy to set up, operate, and scale SQL Server deployments in the Amazon Web Services (AWS) cloud environment. For your enterprise workloads, which depend on Amazon RDS for SQL Server, you need an effective disaster recovery (DR) strategy to help you stay up and running if an unexpected event occurs.
This post discusses how to create a reliable cross-Region DR strategy for Amazon RDS for SQL Server.
DR strategy considerations
When you create a DR strategy for Amazon RDS for SQL Server in the AWS Cloud, you should consider several factors.
First, you need to understand how DR changes in the AWS cloud. Typically, on-premises customers set up synchronous replication for SQL Server in a single data center where they are preventing hardware failure, and they set up either an asynchronous replication or snapshot in a different data center or possibly a different site. AWS provides you with the flexibility to place instances and store data within multiple geographic regions and across multiple Availability Zones (AZ) within each AWS Region. Amazon RDS for SQL Server provides a Multi-AZ deployment, which replicates data synchronously across AZs. This highly available Multi-AZ deployment is usually sufficient for a DR strategy within one region in an AWS cloud environment. In addition, with the in-Region read replica capability of Amazon RDS for SQL Server, you can use read replicas as a warm standby solution in a different AZ. This provides another in-Region DR strategy.
Before you consider a cross-region DR strategy, you need to evaluate whether an in-Region DR solution meets your needs or not.
Recovery point objective (RPO), recovery time objective (RTO), and cost are three key metrics to consider when developing your DR strategy. Based upon these three metrics, you can define your DR strategy ranging from a cold DR (backup and restore) to a hot DR (active to active). A reliable and effective cross-region DR strategy keeps your business in operation with little or no disruption even if an entire region goes offline.
Cross-Region DR strategy
A cross-Region DR strategy consists of two approaches: snapshot and restore, and continuous replication.
Snapshot and restore
If you have less stringent RTO and RPO requirements for your RDS SQL servers, using cross-Region snapshot and restore is one of the most cost effective cross-Region DR strategies.
In your source Region of your Amazon RDS for SQL Server, you can perform the following actions:
- Create snapshots of your Amazon RDS for SQL Server based upon a pre-defined schedule.
- Copy the snapshots to your DR Region. The frequency of snapshot copying is determined based on the RPO requirement.
When you test or execute your DR plan in case of a disaster, you can restore the snapshot to a new Amazon RDS for SQL Server instance.
The time to create a manual snapshot on your primary RDS SQL Server is one of the factors that determines what a minimum RPO value is possible in your environment. The amount of time it takes to create a snapshot varies with the size your databases. The first snapshot of a DB instance contains the data for the full DB instance. Subsequent snapshots of the same DB instance are incremental, which means that only the data that has changed after your most recent snapshot is saved. If you have a very small RPO that requires more frequent DB snapshots, you need to consider a different DR strategy, such as cross-region continuous replication.
Depending on the regions involved and the amount of data to be copied, a cross-region snapshot copy can take hours to complete. You need to take this factor into account when you estimate the RPO of this DR strategy. You can copy a snapshot that has been encrypted using an AWS KMS encryption key. If you copy an encrypted snapshot across regions, you can’t use the same KMS encryption key for the copy as used for the source snapshot, because KMS keys are region-specific. Instead, you must specify a KMS key valid in the destination AWS region.
The RTO of this DR strategy depends on how quickly to restore a DB instance from the copied snapshot.
The cost of this DR strategy is mainly the storage cost for the snapshots and data transfer cost to copy these snapshots from your primary region to DR region.
To automate this DR plan, you can use a few AWS tools, such as AWS Lambda and Amazon CloudWatch Events. For instance, you can schedule events to trigger Lambda functions to create snapshots of your source RDS SQL Server and then copy these to a target DR Region.
To meet very aggressive RPO and RTO requirements, your DR strategy needs to consider continuous replication capability from your source RDS SQL Server to the target RDS SQL Server in your DR Region.
Being a managed service, Amazon RDS for SQL Server helps automate time-consuming tasks like patches, backups, high availability, and more. However, to ensure availability and durability of the instance, Amazon RDS for SQL Server doesn’t allow sysadmin permissions or expose all available engine features. SQL Server MS Replication is one such feature that, as of this writing, isn’t yet available in Amazon RDS for SQL Server. However, you can to use AWS Database Migration Service (AWS DMS) to do continuous replication. AWS DMS needs MS-CDC (change data capture) to be enabled on the Amazon RDS for SQL Server instance. For instructions on enabling this AWS DMS for Amazon RDS for SQL Server, see Introducing Ongoing Replication from Amazon RDS for SQL Server Using AWS Database Migration Service.
The RPO of this DR strategy depends on how quickly AWS DMS consumes the captured changes that are available from the source Amazon RDS for SQL Server instance and applies them to the target. In this DR strategy, the target RDS for SQL Server is active so that you could have a near-zero RTO.
However, this active target DB instance incurs additional costs, including the SQL Server license cost. In addition, because the data transfer by AWS DMS in this DR strategy is to a different region, there is a data transfer-related cost. For more information, see AWS Database Migration Service pricing.
Several factors impact the performance of this DR strategy using AWS DMS, such as availability of your primary RDS SQL Server, available network throughput, number of changes that are captured and the underlying memory/CPU and IO on source and target side. For more information about performance considerations, see Best practices for AWS Database Migration Service.
The MS-CDC feature that is a requirement for AWS DMS is supported for both Standard and Enterprise Editions from SQL Server 2016 SP1 and higher. For older versions, only an enterprise edition supports this feature.
Testing your cross-Region DR strategy
To make sure that your cross-Region DR strategy works, you need to test it out regularly. This regular DR testing enables you to identify potential issues or gaps in your DR strategy and take corrective actions. After each DR strategy test, you should document the results of your testing and find ways to improve your DR strategy. As your business and application requirements may change over time, you should adapt your DR strategy to meet these changes.
Regularly testing and continuously improving your DR strategy helps you be prepared and meet your DR objectives if an actual disaster occurs.
This post provides two primary approaches to create a cross-Region DR strategy of your Amazon RDS for SQL Server based upon RPO, RTO, and cost. There will be future enhancements for Amazon RDS for SQL Server that will make your cross-Region DR strategy even more reliable and efficient. For more information, see Amazon RDS for SQL Server.
About the Authors
Changbin Gong is a Senior Solutions Architect at Amazon Web Services (AWS). He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Changbin enjoys reading, running, and traveling.
Rajeev Thottathil is a Senior Database Solutions Architect at Amazon Web Services.