Posted on Aug 4

Zero-Downtime RDS to Aurora Serverless v2 Migration: A Step-by-Step Guide

Migrating from RDS to Aurora Serverless v2 can reduce database costs by up to 40% while improving performance and scalability. In this guide, I'll walk you through a production-tested migration strategy that ensures zero downtime for your applications.

Why Aurora Serverless v2?

Aurora Serverless v2 offers several advantages over traditional RDS:

Auto-scaling: Scales compute capacity from 0.5 to 128 ACUs in seconds
Cost Efficiency: Pay only for the capacity you use
High Availability: Built-in fault tolerance across multiple AZs
Performance: Up to 5x faster than standard MySQL

Prerequisites

Before starting the migration, ensure you have:

RDS instance running MySQL 5.7+ or PostgreSQL 10+
AWS CLI configured with appropriate permissions
Terraform installed (for infrastructure as code)
Application connection strings that can be updated
Backup of your current database

Migration Strategy Overview

Our zero-downtime approach involves:

Creating an Aurora read replica from RDS
Promoting the replica to a standalone cluster
Enabling Serverless v2 on the cluster
Switching application traffic with minimal disruption

Step 1: Assess Your Current RDS Setup

First, gather metrics to properly size your Aurora Serverless v2 cluster:

# Get current RDS metrics aws cloudwatch get-metric-statistics \ --namespace AWS/RDS \ --metric-name CPUUtilization \ --dimensions Name=DBInstanceIdentifier,Value=your-rds-instance \ --start-time 2024-01-01T00:00:00Z \ --end-time 2024-01-07T00:00:00Z \ --period 3600 \ --statistics Maximum,Average

Key metrics to analyze:

CPU utilization patterns
Connection count
IOPS requirements
Storage size

Step 2: Plan Aurora Serverless v2 Capacity

Based on your RDS metrics, calculate the required ACU range:

# terraform/aurora-serverless-v2.tf locals { # ACU calculation based on RDS instance type # db.r5.large = 2 vCPUs, 16 GB RAM ≈ 4-16 ACUs min_acu = 2 max_acu = 16 } resource "aws_rds_cluster" "aurora_serverless_v2" { cluster_identifier = "my-app-aurora-cluster" engine = "aurora-mysql" engine_mode = "provisioned" engine_version = "8.0.mysql_aurora.3.02.0" database_name = "myapp" master_username = "admin" master_password = random_password.db_password.result serverlessv2_scaling_configuration { max_capacity = local.max_acu min_capacity = local.min_acu } backup_retention_period = 7 preferred_backup_window = "03:00-04:00" enabled_cloudwatch_logs_exports = ["error", "general", "slowquery"] tags = { Environment = "production" ManagedBy = "terraform" } }

Step 3: Create Aurora Read Replica

Create an Aurora read replica from your RDS instance:

# First, create a snapshot of RDS resource "aws_db_snapshot" "rds_snapshot" { db_instance_identifier = "existing-rds-instance" db_snapshot_identifier = "pre-migration-snapshot" } # Create Aurora cluster from snapshot resource "aws_rds_cluster" "aurora_from_snapshot" { cluster_identifier = "aurora-migration-cluster" engine = "aurora-mysql" engine_version = "8.0.mysql_aurora.3.02.0" snapshot_identifier = aws_db_snapshot.rds_snapshot.id # Enable binary logging for replication enabled_cloudwatch_logs_exports = ["audit", "error", "general", "slowquery"] lifecycle { ignore_changes = [snapshot_identifier] } }

Step 4: Implement the Migration

4.1 Set Up Continuous Replication

Configure DMS for continuous replication:

# scripts/setup_dms_replication.py import boto3 import time dms = boto3.client('dms') def create_replication_instance(): response = dms.create_replication_instance( ReplicationInstanceIdentifier='rds-to-aurora-migration', ReplicationInstanceClass='dms.r5.large', AllocatedStorage=100, MultiAZ=True, Tags=[ {'Key': 'Purpose', 'Value': 'RDS-Aurora-Migration'}, ] ) return response['ReplicationInstance']['ReplicationInstanceArn'] def create_migration_task(source_endpoint, target_endpoint, rep_instance_arn): response = dms.create_replication_task( ReplicationTaskIdentifier='rds-aurora-continuous-sync', SourceEndpointArn=source_endpoint, TargetEndpointArn=target_endpoint, ReplicationInstanceArn=rep_instance_arn, MigrationType='full-load-and-cdc', TableMappings='''{ "rules": [{ "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "%", "table-name": "%" }, "rule-action": "include" }] }''' ) return response # Monitor replication lag def check_replication_lag(): response = dms.describe_replication_tasks( Filters=[ { 'Name': 'replication-task-id', 'Values': ['rds-aurora-continuous-sync'] } ] ) task = response['ReplicationTasks'][0] stats = task['ReplicationTaskStats'] print(f"Tables loaded: {stats['TablesLoaded']}") print(f"Tables loading: {stats['TablesLoading']}") print(f"Full load progress: {stats['FullLoadProgressPercent']}%") return stats

4.2 Application Cutover Strategy

Implement a connection manager for seamless cutover:

# app/db_connection_manager.py import os import pymysql from datetime import datetime class DatabaseConnectionManager: def __init__(self): self.use_aurora = os.environ.get('USE_AURORA', 'false').lower() == 'true' self.rds_endpoint = os.environ.get('RDS_ENDPOINT') self.aurora_endpoint = os.environ.get('AURORA_ENDPOINT') def get_connection(self): endpoint = self.aurora_endpoint if self.use_aurora else self.rds_endpoint connection = pymysql.connect( host=endpoint, user=os.environ.get('DB_USER'), password=os.environ.get('DB_PASSWORD'), database=os.environ.get('DB_NAME'), connect_timeout=5, read_timeout=10, write_timeout=10, max_allowed_packet=64 * 1024 * 1024 ) # Log connection for monitoring  print(f"Connected to: {endpoint} at {datetime.now()}") return connection def health_check(self): try: conn = self.get_connection() with conn.cursor() as cursor: cursor.execute("SELECT 1") result = cursor.fetchone() conn.close() return True except Exception as e: print(f"Health check failed: {str(e)}") return False

4.3 Gradual Traffic Migration

Use Route 53 weighted routing for gradual migration:

# terraform/route53_weighted.tf resource "aws_route53_record" "database_weighted_rds" { zone_id = data.aws_route53_zone.main.zone_id name = "db.internal.myapp.com" type = "CNAME" ttl = "60" weighted_routing_policy { weight = var.rds_traffic_weight # Start at 100, gradually reduce } set_identifier = "rds" records = [aws_db_instance.rds.endpoint] } resource "aws_route53_record" "database_weighted_aurora" { zone_id = data.aws_route53_zone.main.zone_id name = "db.internal.myapp.com" type = "CNAME" ttl = "60" weighted_routing_policy { weight = var.aurora_traffic_weight # Start at 0, gradually increase } set_identifier = "aurora" records = [aws_rds_cluster.aurora_serverless_v2.endpoint] }

Step 5: Post-Migration Optimization

5.1 Auto-scaling Configuration

Fine-tune Aurora Serverless v2 scaling:

# scripts/optimize_aurora_scaling.py import boto3 import json from datetime import datetime, timedelta cloudwatch = boto3.client('cloudwatch') rds = boto3.client('rds') def analyze_acu_usage(cluster_id, days=7): """Analyze ACU usage patterns to optimize scaling""" end_time = datetime.utcnow() start_time = end_time - timedelta(days=days) response = cloudwatch.get_metric_statistics( Namespace='AWS/RDS', MetricName='ServerlessDatabaseCapacity', Dimensions=[ {'Name': 'DBClusterIdentifier', 'Value': cluster_id} ], StartTime=start_time, EndTime=end_time, Period=3600, # 1 hour  Statistics=['Average', 'Maximum', 'Minimum'] ) # Calculate optimal ACU range  data_points = response['Datapoints'] if data_points: avg_acu = sum(dp['Average'] for dp in data_points) / len(data_points) max_acu = max(dp['Maximum'] for dp in data_points) # Recommend settings with 20% headroom  recommended_min = max(0.5, avg_acu * 0.5) recommended_max = min(128, max_acu * 1.2) print(f"Current usage analysis:") print(f"Average ACU: {avg_acu:.2f}") print(f"Peak ACU: {max_acu:.2f}") print(f"Recommended min ACU: {recommended_min:.2f}") print(f"Recommended max ACU: {recommended_max:.2f}") return { 'min_acu': recommended_min, 'max_acu': recommended_max } def update_scaling_configuration(cluster_id, min_acu, max_acu): """Update Aurora Serverless v2 scaling configuration""" response = rds.modify_db_cluster( DBClusterIdentifier=cluster_id, ServerlessV2ScalingConfiguration={ 'MinCapacity': min_acu, 'MaxCapacity': max_acu }, ApplyImmediately=True ) print(f"Updated scaling configuration for {cluster_id}") print(f"New range: {min_acu} - {max_acu} ACUs") return response

5.2 Performance Monitoring

Set up comprehensive monitoring:

# terraform/aurora_monitoring.tf resource "aws_cloudwatch_dashboard" "aurora_serverless_v2" { dashboard_name = "aurora-serverless-v2-monitoring" dashboard_body = jsonencode({ widgets = [ { type = "metric" width = 12 height = 6 properties = { metrics = [ ["AWS/RDS", "ServerlessDatabaseCapacity", "DBClusterIdentifier", aws_rds_cluster.aurora_serverless_v2.id], [".", "CPUUtilization", ".", "."], [".", "DatabaseConnections", ".", "."] ] period = 300 stat = "Average" region = var.aws_region title = "Aurora Serverless v2 Metrics" } }, { type = "metric" width = 12 height = 6 properties = { metrics = [ ["AWS/RDS", "ReadLatency", "DBClusterIdentifier", aws_rds_cluster.aurora_serverless_v2.id], [".", "WriteLatency", ".", "."], [".", "ReadThroughput", ".", "."], [".", "WriteThroughput", ".", "."] ] period = 300 stat = "Average" region = var.aws_region title = "Database Performance" } } ] }) } # Alarms for critical metrics resource "aws_cloudwatch_metric_alarm" "high_cpu" { alarm_name = "aurora-serverless-v2-high-cpu" comparison_operator = "GreaterThanThreshold" evaluation_periods = "2" metric_name = "CPUUtilization" namespace = "AWS/RDS" period = "300" statistic = "Average" threshold = "80" alarm_description = "This metric monitors Aurora CPU utilization" dimensions = { DBClusterIdentifier = aws_rds_cluster.aurora_serverless_v2.id } alarm_actions = [aws_sns_topic.alerts.arn] }

Results and Lessons Learned

After completing the migration for our production workload:

Performance Improvements

Query performance: 35% faster on average
Connection time: Reduced from 250ms to 45ms
Failover time: Improved from 60s to <30s

Cost Savings

Compute costs: Reduced by 42% due to auto-scaling
Storage costs: 15% reduction with Aurora storage optimization
Overall savings: $3,200/month for our workload

Key Lessons

Test scaling patterns thoroughly before production cutover
Monitor replication lag closely during migration
Use connection pooling to maximize efficiency
Start conservative with ACU settings, then optimize

Common Pitfalls to Avoid

Don't skip the replication lag monitoring
Ensure all application connection strings are updated
Test failover scenarios before going live
Keep the old RDS instance for at least 7 days post-migration

Conclusion

Migrating from RDS to Aurora Serverless v2 requires careful planning but delivers significant benefits. The zero-downtime approach ensures business continuity while the auto-scaling capabilities of Serverless v2 provide both cost savings and performance improvements.

Have you migrated to Aurora Serverless v2? What challenges did you face? Share your experiences in the comments!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community