A comprehensive AWS-based solution for monitoring API performance and uptime across multiple regions with automated alerting and visualization.
This system provides:
- Scheduled Monitoring: EventBridge triggers Lambda functions every 5 minutes
- Multi-Region Support: Monitor APIs from us-east-1, eu-west-1, and ap-south-1
- Performance Tracking: Record latency, status codes, and availability metrics
- Automated Alerting: CloudWatch Alarms with SNS notifications
- Data Storage: Metrics in CloudWatch, detailed logs in DynamoDB
- Visualization: CloudWatch Dashboards for trend analysis
main.tf- Core AWS resourcesvariables.tf- Configuration parametersoutputs.tf- Resource outputsterraform.tfvars.example- Example configuration
src/monitor_api.py- API monitoring logicsrc/requirements.txt- Python dependencies
config/api_endpoints.json- API endpoints to monitor
-
Prerequisites
- AWS CLI configured
- Terraform installed
- Python 3.9+
-
Setup
# Clone and navigate to project cd automated-api-health-monitoring # Configure variables cp terraform.tfvars.example terraform.tfvars # Edit terraform.tfvars with your settings # Deploy infrastructure terraform init terraform plan terraform apply
-
Configuration
- Update
config/api_endpoints.jsonwith your APIs - Configure SNS notification endpoints
- Customize CloudWatch alarm thresholds
- Update
- HTTP/HTTPS endpoint monitoring
- Response time measurement
- Status code tracking
- Multi-region deployment
- Custom metrics in CloudWatch
- CloudWatch Alarms for latency and availability
- SNS email notifications
- Slack webhook support
- Configurable thresholds
- DynamoDB for detailed monitoring history
- CloudWatch Dashboards
- S3 for log archival
- Trend analysis capabilities
- IAM roles with least privilege
- Encrypted data storage
- VPC endpoints for private APIs
- Resource tagging
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ EventBridge │───▶│ Lambda (API │───▶│ CloudWatch │ │ (Scheduler) │ │ Monitor) │ │ (Metrics) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌──────────────────┐ ┌─────────────────┐ │ DynamoDB │ │ CloudWatch │ │ (History) │ │ Alarms │ └──────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ SNS │ │ (Notifications) │ └─────────────────┘ - Availability: Success rate percentage
- Latency: Response time in milliseconds
- Status Codes: HTTP response codes distribution
- Error Rate: Failed requests percentage
- Regional Performance: Cross-region comparison
- Lambda functions use ARM64 architecture for better price/performance
- DynamoDB on-demand pricing for variable workloads
- CloudWatch log retention policies
- S3 lifecycle policies for log archival
- All IAM roles follow least privilege principle
- Encryption at rest for DynamoDB and S3
- VPC endpoints for private API monitoring
- Secrets Manager for sensitive configuration
Edit config/api_endpoints.json:
{ "endpoints": [ { "name": "my-api", "url": "https://api.example.com/health", "method": "GET", "timeout": 30, "expected_status": 200 } ] }Modify variables in terraform.tfvars:
latency_threshold_ms = 2000 error_rate_threshold = 5- Lambda Timeout: Increase timeout in
variables.tf - Permission Errors: Check IAM role policies
- Network Issues: Verify VPC configuration for private APIs
- CloudWatch Logs for Lambda execution details
- X-Ray tracing for performance analysis
- DynamoDB metrics for storage health
- Fork the repository
- Create a feature branch
- Make changes following AWS best practices
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details