Skip to content

Backup Recovery

Temp edited this page Oct 3, 2025 · 2 revisions

Backup & Recovery Tools

Backup planning, restore validation, and schedule optimization for mission-critical PostgreSQL databases.


📊 Overview

4 enterprise backup tools for comprehensive backup and recovery planning:

Tool Purpose Category
backup_logical Plan logical backups (pg_dump) Backup Planning
backup_physical Assess physical backup readiness Backup Planning
restore_validate Validate backup restore procedures Recovery Validation
backup_schedule_optimize Optimize backup schedules Backup Strategy

🔧 Tool Details

backup_logical

Analyze database for logical backup planning using pg_dump. Provides size estimates, recommended options, and timing predictions.

Parameters:

  • include_schema (boolean, optional): Include schema-only analysis
  • include_data (boolean, optional): Include data-only analysis
  • table_filter (string, optional): Filter by specific table pattern

Returns:

  • database_info: Database name, size, and statistics
  • size_analysis: Expected backup size and compression estimates
  • timing_estimates: Backup duration predictions
  • recommended_options: Suggested pg_dump command options
  • warnings: Potential issues to address before backup

Example:

result = backup_logical( include_schema=True, include_data=True, table_filter="orders%" ) # Returns: { # "database_info": { # "database": "production_db", # "total_size_bytes": 50000000000, # "total_size_gb": 46.57, # "table_count": 125, # "total_rows": 15000000 # }, # "size_analysis": { # "estimated_uncompressed_bytes": 48000000000, # "estimated_uncompressed_gb": 44.7, # "estimated_compressed_bytes": 12000000000, # "estimated_compressed_gb": 11.2, # "compression_ratio_percent": 75.0, # "largest_tables": [ # {"table": "orders", "size_mb": 12500, "row_count": 5000000}, # {"table": "order_items", "size_mb": 8900, "row_count": 8000000} # ] # }, # "timing_estimates": { # "estimated_duration_minutes": 45, # "estimated_start_time": "02:00", # "estimated_end_time": "02:45", # "backup_window_required_minutes": 60 # }, # "recommended_options": { # "format": "custom", # "compression": "9", # "parallel_jobs": 4, # "sample_command": "pg_dump -Fc -Z9 -j4 -f backup.dump production_db" # }, # "warnings": [ # { # "severity": "medium", # "issue": "Large tables detected", # "recommendation": "Consider parallel dumps for tables > 1GB" # } # ] # }

Use Cases:

  • Backup size estimation
  • Backup window planning
  • pg_dump command optimization
  • Cost estimation for backup storage

backup_physical

Assess readiness for physical backups (pg_basebackup, PITR). Checks WAL archiving, replication slots, and configuration.

Parameters:

  • check_wal_archiving (boolean, optional): Check WAL archiving configuration
  • check_replication_slots (boolean, optional): Check replication slot status

Returns:

  • backup_readiness: Overall readiness assessment
  • wal_status: WAL archiving configuration and status
  • replication_slots: Active and inactive slots
  • configuration: Relevant PostgreSQL configuration
  • recommendations: Steps to improve backup readiness

Example:

result = backup_physical( check_wal_archiving=True, check_replication_slots=True ) # Returns: { # "backup_readiness": { # "overall_status": "ready", # "wal_archiving": "enabled", # "replication_slots": "healthy", # "continuous_archiving": True # }, # "wal_status": { # "wal_level": "replica", # "archive_mode": "on", # "archive_command": "cp %p /mnt/wal_archive/%f", # "current_wal_lsn": "0/5A2F3C0", # "last_archived_wal": "000000010000000000000059", # "archiving_lag_seconds": 2.5 # }, # "replication_slots": { # "total_count": 2, # "active_count": 2, # "inactive_count": 0, # "slots": [ # { # "slot_name": "replica1_slot", # "slot_type": "physical", # "active": True, # "restart_lsn": "0/5A2F000", # "confirmed_flush_lsn": null # } # ] # }, # "configuration": { # "max_wal_senders": 10, # "max_replication_slots": 10, # "wal_keep_size": "1024MB", # "wal_sender_timeout": "60s" # }, # "recommendations": [ # { # "priority": "info", # "recommendation": "WAL archiving properly configured", # "action": "None required" # } # ] # }

Use Cases:

  • PITR setup verification
  • Replication monitoring
  • Backup infrastructure health checks
  • Disaster recovery planning

restore_validate

Validate backup restore procedures and estimate recovery times.

Parameters:

  • backup_type (string, required): Type of backup (logical or physical)
  • estimated_backup_size_gb (number, optional): Estimated backup size in GB
  • parallel_restore_jobs (integer, optional): Number of parallel jobs for restore

Returns:

  • validation_results: Restore procedure validation
  • timing_estimates: Recovery time objectives (RTO)
  • requirements: System requirements for restore
  • recommendations: Best practices for restore procedures

Example:

# Validate logical restore result = restore_validate( backup_type="logical", estimated_backup_size_gb=45.0, parallel_restore_jobs=4 ) # Returns: { # "validation_results": { # "backup_type": "logical", # "restore_method": "pg_restore", # "parallel_support": True, # "parallel_jobs": 4 # }, # "timing_estimates": { # "estimated_restore_minutes": 60, # "estimated_index_rebuild_minutes": 20, # "total_rto_minutes": 80, # "total_rto_hours": 1.33, # "backup_size_gb": 45.0 # }, # "requirements": { # "disk_space_required_gb": 67.5, # "buffer_percentage": 50, # "temp_space_gb": 22.5, # "memory_recommended_gb": 8 # }, # "recommendations": [ # { # "priority": "high", # "category": "performance", # "recommendation": "Use parallel restore with -j4 for 4-core systems", # "command": "pg_restore -j4 -d database backup.dump" # }, # { # "priority": "medium", # "category": "disk_space", # "recommendation": "Ensure 67.5 GB free disk space before restore" # } # ] # } # Validate physical restore result = restore_validate( backup_type="physical", estimated_backup_size_gb=120.0 ) # Returns: { # "validation_results": { # "backup_type": "physical", # "restore_method": "pg_basebackup + WAL replay", # "pitr_capable": True # }, # "timing_estimates": { # "estimated_restore_minutes": 30, # "estimated_wal_replay_minutes": 10, # "total_rto_minutes": 40, # "total_rto_hours": 0.67, # "backup_size_gb": 120.0 # }, # "requirements": { # "disk_space_required_gb": 180.0, # "buffer_percentage": 50, # "wal_archive_space_gb": 60.0, # "memory_recommended_gb": 16 # }, # "recommendations": [...] # }

Use Cases:

  • Disaster recovery planning
  • RTO/RPO calculation
  • Restore testing preparation
  • Infrastructure sizing

backup_schedule_optimize

Optimize backup schedules based on database activity patterns and business requirements.

Parameters:

  • full_backup_frequency_hours (integer, required): Desired full backup frequency
  • incremental_backup_frequency_hours (integer, optional): Incremental backup frequency
  • analyze_activity_patterns (boolean, optional): Analyze database activity for optimal timing

Returns:

  • recommended_schedule: Optimized backup schedule
  • activity_analysis: Database activity patterns
  • resource_impact: Expected resource usage
  • retention_recommendations: Backup retention policies

Example:

result = backup_schedule_optimize( full_backup_frequency_hours=24, incremental_backup_frequency_hours=4, analyze_activity_patterns=True ) # Returns: { # "recommended_schedule": { # "full_backups": { # "frequency_hours": 24, # "recommended_start_time": "02:00", # "estimated_duration_minutes": 45, # "daily_schedule": ["02:00"] # }, # "incremental_backups": { # "frequency_hours": 4, # "recommended_start_times": ["06:00", "10:00", "14:00", "18:00", "22:00"], # "estimated_duration_minutes": 5 # } # }, # "activity_analysis": { # "peak_hours": [9, 10, 11, 14, 15, 16], # "low_activity_hours": [1, 2, 3, 4, 5], # "recommended_backup_window": "02:00-04:00", # "transaction_rate_per_hour": { # "peak": 125000, # "average": 45000, # "low": 5000 # } # }, # "resource_impact": { # "full_backup": { # "cpu_impact_percent": 15, # "io_impact_percent": 25, # "network_bandwidth_mbps": 150 # }, # "incremental_backup": { # "cpu_impact_percent": 5, # "io_impact_percent": 8, # "network_bandwidth_mbps": 50 # } # }, # "retention_recommendations": { # "full_backups": { # "daily_retention_days": 7, # "weekly_retention_weeks": 4, # "monthly_retention_months": 12 # }, # "incremental_backups": { # "retention_days": 7 # }, # "estimated_storage_gb": { # "per_full_backup": 45, # "per_incremental": 2, # "total_required": 405 # } # } # }

Use Cases:

  • Backup strategy optimization
  • RPO/RTO alignment
  • Cost optimization (storage, bandwidth)
  • Compliance planning (retention policies)

🎯 Common Workflows

Initial Backup Planning

# 1. Assess physical backup readiness physical = backup_physical( check_wal_archiving=True, check_replication_slots=True ) # 2. Plan logical backups logical = backup_logical( include_schema=True, include_data=True ) # 3. Optimize schedule schedule = backup_schedule_optimize( full_backup_frequency_hours=24, incremental_backup_frequency_hours=4, analyze_activity_patterns=True ) # 4. Validate restore procedures validate = restore_validate( backup_type="logical", estimated_backup_size_gb=logical["size_analysis"]["estimated_compressed_gb"], parallel_restore_jobs=4 )

Backup Health Check

# 1. Check physical backup infrastructure physical_status = backup_physical( check_wal_archiving=True, check_replication_slots=True ) # 2. Verify logical backup can complete logical_status = backup_logical( include_schema=True, include_data=True ) # 3. Test restore validation if physical_status["backup_readiness"]["overall_status"] == "ready": restore_test = restore_validate( backup_type="physical", estimated_backup_size_gb=120.0 )

Disaster Recovery Planning

# 1. Document backup sizes logical_plan = backup_logical(include_schema=True, include_data=True) backup_size = logical_plan["size_analysis"]["estimated_compressed_gb"] # 2. Calculate RTO restore_plan = restore_validate( backup_type="logical", estimated_backup_size_gb=backup_size, parallel_restore_jobs=4 ) rto_minutes = restore_plan["timing_estimates"]["total_rto_minutes"] # 3. Determine backup frequency for RPO schedule = backup_schedule_optimize( full_backup_frequency_hours=24, incremental_backup_frequency_hours=1, # 1-hour RPO analyze_activity_patterns=True )

📊 Backup Best Practices

1. The 3-2-1 Rule

  • 3 copies of your data
  • 2 different media types
  • 1 copy offsite

2. Regular Testing

# Test restores monthly restore_validate(backup_type="logical", estimated_backup_size_gb=50) # Verify backup integrity weekly backup_logical(include_schema=True, include_data=True)

3. Monitoring & Alerting

# Use with Monitoring tools from Monitoring import alert_threshold_set, monitor_real_time # Monitor backup duration backup_alert = alert_threshold_set( metric_type="transaction_age", warning_threshold=1800, # 30 minutes critical_threshold=3600 # 60 minutes )

4. Retention Policies

Recommended Retention:

  • Daily: 7 days
  • Weekly: 4 weeks
  • Monthly: 12 months
  • Yearly: 7 years (compliance)

5. Documentation

Keep records of:

  • Backup schedules and frequencies
  • Restore procedures (runbooks)
  • Contact information for emergencies
  • Storage locations and credentials

🚨 Troubleshooting

Backup Too Large

# Identify large tables result = backup_logical(include_data=True) large_tables = result["size_analysis"]["largest_tables"] # Consider: # - Archiving old data # - Partitioning large tables # - Separate backup strategy for large tables

WAL Archiving Not Working

result = backup_physical(check_wal_archiving=True) wal_status = result["wal_status"] # Check: # - archive_command syntax # - Filesystem permissions # - Disk space on archive location

Restore Too Slow

result = restore_validate( backup_type="logical", estimated_backup_size_gb=100, parallel_restore_jobs=8 # Increase parallelism ) # Consider: # - More CPU cores for parallel restore # - Faster storage (SSD) # - Restore indexes separately after data

📚 Related Documentation


🔗 External Resources


See Home for more tool categories.

Clone this wiki locally