Skip to content

AWS-based API Health Monitoring System with multi-region support, real-time alerting, and comprehensive metrics. Production-ready infrastructure as code using Terraform, Lambda, CloudWatch, and DynamoDB.

License

Notifications You must be signed in to change notification settings

Copubah/automated-api-health-monitoring

Repository files navigation

Automated API Health Monitoring System

A comprehensive AWS-based solution for monitoring API performance and uptime across multiple regions with automated alerting and visualization.

Architecture Overview

This system provides:

  • Scheduled Monitoring: EventBridge triggers Lambda functions every 5 minutes
  • Multi-Region Support: Monitor APIs from us-east-1, eu-west-1, and ap-south-1
  • Performance Tracking: Record latency, status codes, and availability metrics
  • Automated Alerting: CloudWatch Alarms with SNS notifications
  • Data Storage: Metrics in CloudWatch, detailed logs in DynamoDB
  • Visualization: CloudWatch Dashboards for trend analysis

Components

Infrastructure (Terraform)

  • main.tf - Core AWS resources
  • variables.tf - Configuration parameters
  • outputs.tf - Resource outputs
  • terraform.tfvars.example - Example configuration

Lambda Function

  • src/monitor_api.py - API monitoring logic
  • src/requirements.txt - Python dependencies

Configuration

  • config/api_endpoints.json - API endpoints to monitor

Quick Start

  1. Prerequisites

    • AWS CLI configured
    • Terraform installed
    • Python 3.9+
  2. Setup

    # Clone and navigate to project cd automated-api-health-monitoring # Configure variables cp terraform.tfvars.example terraform.tfvars # Edit terraform.tfvars with your settings # Deploy infrastructure terraform init terraform plan terraform apply
  3. Configuration

    • Update config/api_endpoints.json with your APIs
    • Configure SNS notification endpoints
    • Customize CloudWatch alarm thresholds

Features

Core Monitoring

  • HTTP/HTTPS endpoint monitoring
  • Response time measurement
  • Status code tracking
  • Multi-region deployment
  • Custom metrics in CloudWatch

Alerting & Notifications

  • CloudWatch Alarms for latency and availability
  • SNS email notifications
  • Slack webhook support
  • Configurable thresholds

Data & Visualization

  • DynamoDB for detailed monitoring history
  • CloudWatch Dashboards
  • S3 for log archival
  • Trend analysis capabilities

Security & Best Practices

  • IAM roles with least privilege
  • Encrypted data storage
  • VPC endpoints for private APIs
  • Resource tagging

Architecture Diagram

┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ EventBridge │───▶│ Lambda (API │───▶│ CloudWatch │ │ (Scheduler) │ │ Monitor) │ │ (Metrics) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌──────────────────┐ ┌─────────────────┐ │ DynamoDB │ │ CloudWatch │ │ (History) │ │ Alarms │ └──────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ SNS │ │ (Notifications) │ └─────────────────┘ 

Monitoring Metrics

  • Availability: Success rate percentage
  • Latency: Response time in milliseconds
  • Status Codes: HTTP response codes distribution
  • Error Rate: Failed requests percentage
  • Regional Performance: Cross-region comparison

Cost Optimization

  • Lambda functions use ARM64 architecture for better price/performance
  • DynamoDB on-demand pricing for variable workloads
  • CloudWatch log retention policies
  • S3 lifecycle policies for log archival

Security Considerations

  • All IAM roles follow least privilege principle
  • Encryption at rest for DynamoDB and S3
  • VPC endpoints for private API monitoring
  • Secrets Manager for sensitive configuration

Customization

Adding New APIs

Edit config/api_endpoints.json:

{ "endpoints": [ { "name": "my-api", "url": "https://api.example.com/health", "method": "GET", "timeout": 30, "expected_status": 200 } ] }

Adjusting Thresholds

Modify variables in terraform.tfvars:

latency_threshold_ms = 2000 error_rate_threshold = 5

Troubleshooting

Common Issues

  1. Lambda Timeout: Increase timeout in variables.tf
  2. Permission Errors: Check IAM role policies
  3. Network Issues: Verify VPC configuration for private APIs

Monitoring the Monitor

  • CloudWatch Logs for Lambda execution details
  • X-Ray tracing for performance analysis
  • DynamoDB metrics for storage health

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes following AWS best practices
  4. Test thoroughly
  5. Submit a pull request

License

MIT License - see LICENSE file for details

About

AWS-based API Health Monitoring System with multi-region support, real-time alerting, and comprehensive metrics. Production-ready infrastructure as code using Terraform, Lambda, CloudWatch, and DynamoDB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published