AWS Unpacked #14: Disaster Recovery and High Availability

Understanding Disaster Recovery in AWS

What Is a Disaster?

A disaster is any event that negatively impacts your system’s availability, performance, or business continuity. It could be physical damage, cyber attacks, accidental deletions, or regional outages.

The Goal of Disaster Recovery (DR)

DR is about being ready before disaster hits. It’s the process of designing systems that can recover quickly and efficiently with minimal data loss. In AWS, DR isn’t just about data backups—it’s about designing resilient architectures and using cloud-native tools to get your systems back online fast.

Key Concepts: RTO and RPO

Two terms you’ll see again and again:

RPO (Recovery Point Objective): How much data you can afford to lose (measured in time). E.g. “We can tolerate losing the last 5 minutes of data.”
RTO (Recovery Time Objective): How long it takes to get back up and running. E.g. “We must be online within 30 minutes.”

Traditional vs Cloud DR Scenarios

On-Prem → On-Prem: Traditional, costly, and inflexible.
On-Prem → AWS: Hybrid DR using AWS as a backup and recovery environment.
AWS Region A → AWS Region B: Full cloud-native DR strategy, offering high automation and rapid recovery.

AWS Disaster Recovery Strategies

Backup and Restore

This is the simplest and most cost-effective option. Backups are stored in S3, Glacier, or replicated across regions. Restoration takes the longest.

RTO: Hours to days
RPO: Minutes to hours
Use Case: Non-critical systems or startups minimizing costs

Pilot Light

A minimal version of your environment is always running in AWS—just enough to power the most critical functions.

RTO: Tens of minutes
RPO: Minutes
Use Case: Businesses needing faster recovery without the cost of a full standby environment

Warm Standby

A scaled-down version of the full system runs in AWS, ready to scale up quickly.

RTO: Minutes
RPO: Sub-minute to minutes
Use Case: Medium-to-high availability systems

Multi-Site / Hot Site

Production workloads run in two or more locations simultaneously.

RTO: Seconds to minutes
RPO: Near zero
Use Case: Mission-critical applications that require maximum uptime

AWS Tips for Better DR

Backups

Use EBS snapshots, RDS backups, and S3 versioning
Implement S3 lifecycle policies and Cross Region Replication
Use Snowball or Storage Gateway for large on-premises backups

High Availability

Deploy multi-AZ and multi-region setups where possible
Route 53 can route traffic to a healthy region
Use Site-to-Site VPN as a backup to Direct Connect

Replication

RDS cross-region replicas, Aurora Global Databases
Continuous replication from on-prem to AWS with DMS
File-level replication via Storage Gateway

Automation

Use CloudFormation or Elastic Beanstalk to spin up infrastructure
Set up CloudWatch Alarms to trigger failover or reboot EC2
Lambda can automate customized recovery workflows

Embracing Chaos

Borrow a page from Netflix’s “Simian Army”: test DR by simulating failures. This helps you find and fix weaknesses before a real disaster.

AWS Services for Migration and Recovery

AWS Database Migration Service (DMS)

DMS is designed to migrate databases with minimal downtime. You can move data between on-premises and AWS, or between AWS services—supporting both homogeneous migrations (e.g. PostgreSQL to PostgreSQL) and heterogeneous migrations (e.g. Oracle to MySQL).

Key Features

Supports one-time migrations or ongoing replication (great for minimizing cutover time)
Works with most commercial and open-source DBs (Oracle, SQL Server, MySQL, PostgreSQL, etc.)
You don’t need to install agents on the source or target databases

How It Works

DMS uses a replication instance, which is an EC2 under the hood (you manage the specs, AWS manages the patching)
The replication instance connects to your source and target databases and performs the migration tasks
You configure endpoints for source and destination, then create and manage migration tasks (full load, ongoing changes, or both)

When Source and Target Engines Differ

If the source and target databases use different engines, you need to use the AWS Schema Conversion Tool (SCT). DMS only moves the data—not the schema (like tables, indexes, stored procedures, etc.).

Examples:

Oracle → MySQL: Use SCT to convert schema, then DMS to migrate the data
SQL Server → PostgreSQL: Same deal—SCT handles schema translation

SCT will tell you which parts of the schema can be auto-converted and where manual work is required (especially if you’re using vendor-specific functions).

Example: Ongoing Replication from On-Prem Oracle to RDS MySQL

Let’s say your Oracle database is running in your corporate data center, and you want to continuously replicate data into an Amazon RDS for MySQL instance. Here’s a high-level setup:

Set up network connectivity: You need connectivity between AWS and your data center. Typically, use a Site-to-Site VPN or AWS Direct Connect.
Provision a DMS replication instance: This is an EC2 instance managed by DMS. Place it in a VPC that can reach both source and target.
Install Oracle client drivers on the replication instance (automated by AWS if you select the right engine version).
Create source and target endpoints: DMS needs login credentials and connection details for both the Oracle DB and RDS MySQL.
Use SCT to convert the Oracle schema to MySQL: Apply the converted schema to the RDS target DB before data replication starts.
Create a DMS task: Set it to do a full load + ongoing replication (using Oracle’s redo logs for change data capture).
Monitor and validate: Use DMS’ validation tools to compare source/target data and ensure accuracy.

This setup allows you to keep the on-prem Oracle database live while the data is streamed into RDS. When you’re ready, you can cut over to the AWS-hosted DB with minimal disruption.

Limitations and Considerations

Replication latency depends on the network between the replication instance and your source DB
DDL changes (e.g. new columns) aren’t automatically handled unless explicitly enabled
Some data types and vendor-specific functions may not convert cleanly during schema conversion
Always test and rehearse migration workflows before doing it live

RDS & Aurora Migrations

Aurora is fully compatible with MySQL and PostgreSQL, which makes migrations fairly straightforward — whether you’re coming from RDS or from an external database. The general tools used for migration include RDS snapshots, read replica promotion, DMS (for live migrations), and sometimes direct S3-based restores.

Migrating from RDS to Aurora

If you’re already using RDS for MySQL or PostgreSQL, migrating to Aurora involves minimal effort. You have two main options:

Option 1: Restore from RDS Snapshots

You can take a snapshot of your RDS instance and restore it directly as a new Aurora database. This is a simple lift-and-shift and works well for one-time migrations.
Option 2: Promote an Aurora Read Replica

You can create an Aurora Read Replica from your RDS MySQL or PostgreSQL database. Once replication is caught up (i.e. replication lag is zero), you can promote the replica to be a standalone Aurora cluster.

This method is ideal if you want near-zero downtime, but it does take time and incurs extra cost while both databases are running.

Migrating from External MySQL to Aurora MySQL

You’ve got a couple of solid paths here:

Option 1: S3-Based Migration Using Percona XtraBackup
1. Use Percona XtraBackup to take a backup of your source MySQL database.
2. Upload the backup files to an S3 bucket.
3. Use that S3 bucket to restore into a new Aurora MySQL database.
This is a faster method compared to logical dumps and is ideal for larger datasets.
Option 2: mysqldump Utility
1. Use mysqldump to export your data.
2. Import it into a newly created Aurora MySQL instance.
This is easier but much slower — good for smaller databases or dev environments.
Option 3: Use AWS DMS

If both source and target databases are live and network-accessible, you can use AWS Database Migration Service (DMS) for continuous replication. DMS works well if you want to keep both environments in sync for a period of time (e.g. for testing before full cutover).

Migrating from External PostgreSQL to Aurora PostgreSQL

PostgreSQL migrations follow a similar structure:

Option 1: Snapshot-Like Migration via S3
1. Take a PostgreSQL backup.
2. Upload it to S3.
3. Use the aws_s3 extension in Aurora PostgreSQL to import the backup.
This is efficient and allows you to leverage S3 as an intermediate storage layer.
Option 2: Use AWS DMS

As with MySQL, you can use DMS to perform live migration from an external PostgreSQL source into Aurora PostgreSQL. Continuous replication helps with low-downtime or blue/green deployment strategies.
Option 3: Aurora Read Replica and Promotion (for RDS PostgreSQL)

This works exactly like the MySQL version — you create an Aurora Read Replica from your RDS PostgreSQL, and once replication lag hits zero, promote it to its own cluster.

Architectural Considerations

Aurora Global Databases: These allow a primary DB in one region with read replicas in others. They’re great for cross-region DR and global apps. In the event of a failure, you can promote a secondary region to be the new primary.
Backtracking (Aurora MySQL only): Aurora MySQL supports backtracking, which lets you roll back your DB to a previous state without restoring from backups. This is great for recovering from logical errors without full restore downtime.
The Aurora Read Replica promotion process isn’t instantaneous — plan for the time and cost.
DMS requires that both source and target databases are accessible and supported. It won’t convert incompatible schema types — for that, use AWS Schema Conversion Tool (SCT).
Always monitor replication lag if you’re using read replicas. Promotion before catching up can result in data loss.

On-Premise Migration Strategies

If you’re running workloads in an on-premises data center and want to either migrate to AWS or set up disaster recovery capabilities, AWS offers several tools and strategies to help you make that transition smoothly.

Virtual Machine (VM) Migrations to AWS

You can migrate your existing virtual machines from on-prem into AWS and even bring them back again if needed:

Amazon Linux 2 as a VM: You can download Amazon Linux 2 in .iso format and run it on your on-prem hypervisor (e.g. VMWare, KVM, VirtualBox, or Microsoft Hyper-V). This is useful for consistency between dev/test environments and AWS.
VM Import/Export: Use this to import your existing on-prem VMs into Amazon EC2 and run them there as instances. You can also export them back out to your data center if needed.
Disaster Recovery Repo: Build a DR strategy by storing critical VM images in AWS as a cold standby, ready to launch into EC2 during a failure event.

Planning and Assessing Your Migration

Before diving into the actual move, AWS provides tools to help you assess what’s running in your data center:

AWS Application Discovery Service: Automatically collects detailed info on your on-prem servers—like CPU usage, network activity, and software inventory. It helps you plan what to migrate and how to size your AWS resources.
Server Utilization & Dependency Mapping: Understand which services talk to each other and track how your infrastructure performs, which is essential for avoiding surprises post-migration.
AWS Migration Hub: Use this to centralize and track the status of all your migration projects across different AWS services.

Migrating On-Prem Databases

AWS Database Migration Service (DMS): This service lets you replicate data between:
- On-premise and AWS
- AWS to AWS (e.g. between regions)
- AWS back to on-premise (useful for DR testing or hybrid scenarios)

DMS supports a wide variety of engines, including Oracle, MySQL, PostgreSQL, SQL Server, and even DynamoDB. If your source and target databases use different engines (e.g. Oracle to Aurora MySQL), you’ll need SCT to convert the schema and code objects (like stored procedures and triggers) before you can migrate the data.

Migrating Entire Servers

AWS Server Migration Service (SMS): This service performs incremental replication of your live on-premises servers (including OS and application state) into AWS. It’s ideal for large-scale server migration projects and can help minimize downtime during cutover.

AWS Backup

AWS Backup is a fully managed, centralized backup service that helps you automate and consolidate backups across AWS services—without the complexity of writing custom scripts or managing scattered backup processes manually.

Why Use AWS Backup?

Instead of setting up individual backup solutions for EC2, RDS, or S3, AWS Backup lets you manage everything from one place. It supports a broad set of AWS services:

Compute & Storage: EC2, EBS, S3
Databases: RDS (all engines), Aurora, DynamoDB, DocumentDB, Neptune
File Systems: EFS, FSx (for Windows and Lustre)
Hybrid Storage: AWS Storage Gateway (Volume Gateway)

It also supports cross-region backups, allowing you to store copies in different AWS Regions for disaster recovery. Even better—it supports cross-account backups, which is a great practice for isolating backup data from the source environment to protect against accidental or malicious deletion.

Backup Plans: Automate Everything

AWS Backup uses Backup Plans, which are essentially blueprints that define how and when backups happen. Plans are flexible and tag-driven, which means you can apply rules to resources automatically based on tags (like Environment=Production).

A typical backup plan includes:

Backup frequency: Choose from predefined intervals (e.g. every 12 hours, daily, weekly) or define your own with cron expressions.
Backup windows: Define when the backup operation should run.
Retention periods: Choose how long to keep backups (from days to years—or forever).
Transition to cold storage: Automatically move older backups to cheaper storage tiers (like Glacier), based on your cost and compliance needs.
Point-in-Time Recovery (PITR): Available for supported services like RDS and DynamoDB, so you can restore to a precise moment just before a failure.

Whether you’re backing up critical databases, file systems, or entire EC2 instances, these features make it easy to design a robust and compliant backup lifecycle.

Vault Lock

Security is a major concern in disaster recovery planning, and AWS Backup doesn’t cut corners here. AWS Backup Vault Lock enforces a WORM (Write Once, Read Many) model, which makes sure that once a backup is created, it can’t be modified or deleted—not even by the root user.

Vault Lock protects your backups from:

Accidental deletions
Malicious tampering
Unintended retention changes

Once Vault Lock is enabled, your backup data is truly immutable—making it a powerful tool against ransomware and internal threats.

AWS Application Migration Service (MGN)

MGN lets you rehost applications from pretty much any source—physical servers, VMs, or cloud-hosted systems—into AWS EC2.

But before you migrate, it helps to know what you’re dealing with. That’s where the Application Discovery Service steps in.

Why use it? To gather insight into your existing environment—especially when you don’t have a full inventory or want to map dependencies.

Two ways to discover:

Agentless Discovery Connector (usually deployed in vCenter):
- Collects inventory, VM configurations, performance stats (CPU, memory, disk)
Agent-based Discovery Agent:
- Deeper insight: system configs, processes, and network connections between systems

Once collected, all this data shows up in AWS Migration Hub, where you can plan and track your migration project centrally.

Now you can proceed with Migration using the AWS Application Migration Service (MGN). What it does:

Replicates your source machines into AWS with continuous block-level replication
After testing, it spins up EC2 instances from the replicated volumes
The original server can remain online until cutover (minimizing downtime)
Supports Linux and Windows, with wide compatibility

Why MGN?

Lift-and-shift solution which simply migration applications to AWS
Fully managed and agent-based
More cost-effective and scalable than the legacy SMS
Handles complex environments without having to rearchitect immediately
Converts physical, virtual and cloud-based servers to run natively on AWS
Minimal downtime, reduced costs

This is ideal when you want to migrate fast, without changing how the app is built.

Bonus Tip: You can combine MGN with CloudWatch Alarms, Systems Manager, or Lambda to automate post-migration steps like installing agents or patching.

Transferring Large Datasets into AWS

Transferring huge datasets (think 100s of TBs) into AWS needs careful planning. The options range from quick-and-easy internet transfers to physical appliance shipping. Choose based on time, bandwidth, and use case. Let’s say you need to move 200TB to AWS. With just a 100 Mbps line:

Over internet/Site-to-Site VPN: Easy to start, but expect ~185 days
Over AWS Direct Connect (1 Gbps): Faster, but setup takes time — around ~18.5 days
With AWS Snowball: Hardware appliances sent to you — typically 1 week for full cycle

How does these options stack up against each other?

1. Over the Internet / VPN

Good for small or ongoing trickles of data
Encryption and secure tunneling required
Slow and not ideal for huge one-time moves

2. AWS Direct Connect

Dedicated network line to AWS
1–10 Gbps (or higher) speeds
Takes time to provision, but once up, it’s reliable and private

3. AWS Snow Family (Snowball / Snowmobile)

AWS ships you physical devices to load data on-site
Snowball Edge (up to 80TB per device) — rugged, secure, efficient
Great for “big bang” migrations
Supports offline encryption and tracking
Use 2–3 in parallel for faster turnaround

Ongoing syncs?

Use DataSync or DMS over Direct Connect or VPN for incremental data flows

Pro Tip: You can chain Snowball for the initial load and then switch to DMS or DataSync for ongoing replication.

VMware Cloud on AWS

VMware Cloud on AWS lets you run your existing VMware environments in the AWS Cloud without needing to refactor. It’s a powerful bridge for enterprises that want hybrid cloud flexibility. Many enterprises rely on VMware to run their data centers. Rewriting every app for the cloud can be costly and slow. But what if you could just lift those VMware-based apps as-is into AWS and keep using the same tools? That’s what VMware Cloud on AWS does.

Key Features

Seamlessly extend your vSphere-based workloads into AWS
Use familiar VMware tools (vCenter, vMotion, NSX, vSAN) — just now on EC2-backed infrastructure
AWS provides the underlying compute, storage, and networking

Use Cases

Cloud Bursting: Temporarily expand capacity into AWS during peak loads
Disaster Recovery: Use VMware Cloud on AWS as your DR target for on-prem workloads
Data Center Extension or Exit: Migrate in stages, or decommission on-prem data centers gradually

Bonus: Integrated with other AWS services — use S3 for backups, connect with AWS Direct Connect, or run analytics on data using native AWS services like Athena or Redshift.

Real-world scenario: A financial services firm running mission-critical apps on vSphere wants to expand to a second region for resilience but doesn’t want to rewrite the app. With VMware Cloud on AWS, they replicate their VMs and failover easily — all without retraining staff.

AWS Unpacked #14: Disaster Recovery and High Availability

TL;DR

Understanding Disaster Recovery in AWS

What Is a Disaster?

The Goal of Disaster Recovery (DR)

Key Concepts: RTO and RPO

Traditional vs Cloud DR Scenarios

AWS Disaster Recovery Strategies

Backup and Restore

Pilot Light

Warm Standby

Multi-Site / Hot Site

AWS Tips for Better DR

Backups

High Availability

Replication

Automation

Embracing Chaos

AWS Services for Migration and Recovery

AWS Database Migration Service (DMS)

RDS & Aurora Migrations

On-Premise Migration Strategies

AWS Backup

AWS Application Migration Service (MGN)

Transferring Large Datasets into AWS

VMware Cloud on AWS

About the Author

TL;DR

Understanding Disaster Recovery in AWS

What Is a Disaster?

The Goal of Disaster Recovery (DR)

Key Concepts: RTO and RPO

Traditional vs Cloud DR Scenarios

AWS Disaster Recovery Strategies

Backup and Restore

Pilot Light

Warm Standby

Multi-Site / Hot Site

AWS Tips for Better DR

Backups

High Availability

Replication

Automation

Embracing Chaos

AWS Services for Migration and Recovery

AWS Database Migration Service (DMS)

RDS & Aurora Migrations

On-Premise Migration Strategies

AWS Backup

AWS Application Migration Service (MGN)

Transferring Large Datasets into AWS

VMware Cloud on AWS

About the Author

Share this post

Related Articles

AWS Unpacked #6: Amazon S3

AWS Unpacked #5: Managed Database Services

AWS Unpacked #11: CloudFront