Hey everyone! Today I'm diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it's time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.
What We're Building Today
- ECS Service that keeps containers running and healthy
- Smart Auto-Scaling that maintains optimal performance (1-5 containers)
- FastAPI Application with multiple endpoints for testing
- CloudWatch Monitoring with email alerts
- Load Testing Endpoints to validate scaling behavior
The Complete ECS Service with Auto-Scaling
Note: We already created the ECS cluster in our previous setup, so we'll focus on the service configuration.
Here's the full ECS service configuration with intelligent auto-scaling:
# infra/ecs/ecs_service.yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'Creating ECS Service for Learning Purpose' Parameters: TeamNameValue: Type: String Description: TeamName Tag Value Default: "awslearner" EnvironmentValue: Type: String Description: Environment Tag Value Default: "dev" ServiceName: Type: String Description: Name of the ECS service Default: "learner-svc" ExecutionRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-role" TaskRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-task-exc-role" ImageARN: Type: AWS::SSM::Parameter::Value<String> Description: Image ARN Default: "/learner/imagearn/value" ECSCluster: Type: String Description: ECS Cluster Name Default: "learner-cluster" PublicSubnetIds: Type: AWS::SSM::Parameter::Value<String> Description: Subnet ID Default: "/learner/public/subnetids" SecurityGroup: Type: AWS::SSM::Parameter::Value<String> Description: Security Group Default: "/learner/public/sgid" TargetGroupArn: Type: AWS::SSM::Parameter::Value<String> Description: Target Group ARN Default: "/learner/target/value" AlertEmail: Type: String Description: Email address for alerts Default: <Provide Email Address> Resources: # Storage for your container logs ECSLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub /ecs/${ServiceName}-logs RetentionInDays: 14 Tags: - Key: Name Value: !Sub /ecs/${ServiceName}-logs - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Blueprint that tells ECS how to run your containers TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Sub ${ServiceName}-task Cpu: 256 Memory: 512 NetworkMode: awsvpc RequiresCompatibilities: - FARGATE ExecutionRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${ExecutionRoleName} TaskRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${TaskRoleName} ContainerDefinitions: - Name: !Sub ${ServiceName}-container Image: !Ref ImageARN PortMappings: - ContainerPort: 80 LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref ECSLogGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: ecs Tags: - Key: Name Value: !Sub ${ServiceName}-task - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Service that keeps your containers running and healthy ECSService: Type: AWS::ECS::Service Properties: Cluster: !Ref ECSCluster ServiceName: !Sub ${ServiceName}-service TaskDefinition: !Ref TaskDefinition LaunchType: FARGATE DesiredCount: 1 PropagateTags: SERVICE NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: ENABLED Subnets: !Split - "," - !Ref PublicSubnetIds SecurityGroups: - !Ref SecurityGroup LoadBalancers: - ContainerName: !Sub ${ServiceName}-container ContainerPort: 80 TargetGroupArn: !Ref TargetGroupArn Tags: - Key: Name Value: !Sub ${ServiceName}-service - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Defines scaling limits for your containers (1-5 tasks) AutoScalingTarget: Type: AWS::ApplicationAutoScaling::ScalableTarget DependsOn: ECSService Properties: ServiceNamespace: ecs ResourceId: !Sub "service/${ECSCluster}/${ServiceName}-service" ScalableDimension: ecs:service:DesiredCount MinCapacity: 1 MaxCapacity: 5 # Target tracking scaling policy - maintains CPU around 40% AutoScalingPolicy: Type: AWS::ApplicationAutoScaling::ScalingPolicy Properties: PolicyName: !Sub ${EnvironmentValue}-${ServiceName}-target-tracking PolicyType: TargetTrackingScaling ScalingTargetId: !Ref AutoScalingTarget TargetTrackingScalingPolicyConfiguration: TargetValue: 40.0 # Target 40% CPU utilization PredefinedMetricSpecification: PredefinedMetricType: ECSServiceAverageCPUUtilization ScaleOutCooldown: 120 # Wait 2 minutes before scaling up ScaleInCooldown: 300 # Wait 5 minutes before scaling down DisableScaleIn: false # Email notification system for alerts AlertTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub ${EnvironmentValue}-${ServiceName}-alerts DisplayName: !Sub "${ServiceName} ECS Alerts" Tags: - Key: Name Value: !Sub ${EnvironmentValue}-${ServiceName}-alerts - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Connects your email to the alert system AlertSubscription: Type: AWS::SNS::Subscription Properties: Protocol: email TopicArn: !Ref AlertTopic Endpoint: !Ref AlertEmail # Alert when CPU is high at maximum capacity CriticalCPUAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-CriticalCPU-AtMaxCapacity AlarmDescription: !Sub "CRITICAL: ${ServiceName} CPU >70% at max capacity (5 tasks)" MetricName: CPUUtilization Namespace: AWS/ECS Statistic: Average Period: 60 EvaluationPeriods: 2 Threshold: 70 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster # Alert when you reach maximum number of containers MaxCapacityAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-MaxCapacity-Reached AlarmDescription: !Sub "WARNING: ${ServiceName} reached maximum capacity (5 tasks)" MetricName: RunningTaskCount Namespace: AWS/ECS Statistic: Maximum Period: 60 EvaluationPeriods: 1 Threshold: 5 ComparisonOperator: GreaterThanOrEqualToThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster
FastAPI Application with Testing Endpoints
Here's my FastAPI application with endpoints designed to test different scenarios:
# source/app.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel import logging import time import hashlib import boto3 import threading import multiprocessing import uvicorn app = FastAPI() logger = logging.getLogger("ecs_service") active_requests = 0 # Initialize ECS client for monitoring try: ecs_client = boto3.client('ecs') except Exception as e: logger.warning(f"Could not initialize ECS client: {e}") ecs_client = None class SubmitData(BaseModel): name: str = "User" @app.get("/") def home(): """Simple welcome message""" return "Hello from ECS Fargate Service!" @app.get("/api/health") def health(): """Health check endpoint for load balancer""" return {"status": "healthy"} @app.post("/api/submit") def api_submit(data: SubmitData): """Accepts user data and returns personalized message""" logger.info(f"Data received: {data.model_dump()}") return {"message": f"Happy learning, {data.name}!", "data": data.model_dump()} @app.get("/api/load") def generate_load(): """Generates heavy CPU load for 60 seconds to test auto-scaling""" def cpu_intensive_task(): start_time = time.time() while time.time() - start_time < 60: for _ in range(10000): hashlib.sha256(str(time.time()).encode()).hexdigest() sum(range(1000)) # Use multiple threads to maximize CPU usage cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 2): thread = threading.Thread(target=cpu_intensive_task) thread.start() threads.append(thread) for thread in threads: thread.join() logger.info("CPU load generation completed") return {"status": "load_generated", "duration": "60s", "threads": cpu_count * 2} @app.get("/api/quickload") def quick_load(): """Generates 10-second CPU burst for quick scaling tests""" def burst_task(): start_time = time.time() while time.time() - start_time < 10: for _ in range(50000): hashlib.sha256(str(time.time()).encode()).hexdigest() cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 3): thread = threading.Thread(target=burst_task) thread.start() threads.append(thread) for thread in threads: thread.join() return {"status": "burst_completed", "duration": "10s"} @app.get("/api/scalinginfo") def get_scaling_info(): """Returns current ECS service scaling status""" if not ecs_client: return {"error": "ECS client not available"} try: response = ecs_client.describe_services( cluster='learner-cluster', services=['learner-svc-service'] ) if response['services']: service = response['services'][0] return { "cluster": "learner-cluster", "service": "learner-svc-service", "desired_count": service['desiredCount'], "running_count": service['runningCount'], "pending_count": service['pendingCount'], "status": service['status'], "active_requests": active_requests } else: return {"error": "Service not found"} except Exception as e: logger.error(f"Error getting scaling info: {str(e)}") return {"error": "Unable to fetch scaling information"} @app.get("/api/error") def trigger_error(): """Triggers 500 error for testing error handling""" logger.error("Intentional error triggered") raise HTTPException(status_code=500, detail="Internal server error") @app.get("/api/notfound") def not_found(): """Triggers 404 error for testing not found responses""" logger.warning("Resource not found") raise HTTPException(status_code=404, detail="Resource not found") if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)
Requirements File
# source/requirements.txt fastapi==0.104.1 uvicorn==0.24.0 boto3==1.34.0 pydantic==2.5.0
Deployment Commands
Important: Before deploying the ECS service, we need to build and push our container image using the CodeBuild project we set up in Day 4.
Deploy in this order:
# 1. First, build and push your container image # This uses the CodeBuild project from Day 4 aws codebuild start-build --project-name learner-project # Wait for the build to complete (check in AWS Console or CLI) # This typically takes 2-3 minutes # 2. Deploy ECS service with auto-scaling aws cloudformation deploy \ --template-file infra/ecs/ecs_service.yaml \ --stack-name AWSLearner-ECS-Stack \ --capabilities CAPABILITY_NAMED_IAM
Auto-Scaling Scenario's
1. Check Current Status
curl http://your-alb-url/api/scalinginfo
Response:
{ "desired_count": 1, "running_count": 1, "pending_count": 0, "status": "ACTIVE" }
2. Trigger Heavy Load
curl http://your-alb-url/api/load
This creates 60 seconds of intense CPU load. Watch CloudWatch metrics to see:
- CPU utilization spike to 80-90%
- Auto-scaling trigger after 2 minutes
- New containers start (desired_count increases)
- Load distributes across containers
- CPU drops back to target 40%
3. Quick Burst Test
curl http://your-alb-url/api/quickload
Perfect for testing rapid scaling response with a 10-second burst.
Postman Testing
- GET /api/health - Health check status
- POST /api/submit - Data submission with JSON body
- GET /api/scalinginfo - Current container status
- GET /api/load - Load generation response
- GET /api/quickload - Quick burst response
- GET /api/error - Error handling test
- GET /api/notfound - 404 error test
Note: To see auto-scaling in action, you'll need to hit the load endpoints multiple times or use multiple browser tabs/terminals simultaneously to generate enough traffic that triggers the CPU threshold.
What Each Endpoint Does
Endpoint | Purpose | Response Time |
---|---|---|
/api/health | Load balancer health check | Instant |
/api/submit | Data processing test | Instant |
/api/load | Sustained CPU load (60s) | 60 seconds |
/api/quickload | CPU burst (10s) | 10 seconds |
/api/scalinginfo | Current container status | Instant |
/api/error | Error handling test | Instant |
/api/notfound | 404 error test | Instant |
Auto-Scaling Behavior
How it works:
- Target: Maintains 40% CPU utilization
- Scale Out: Adds containers when CPU > 40% (2-minute cooldown)
- Scale In: Removes containers when CPU < 40% (5-minute cooldown)
- Limits: 1-5 containers
- Alerts: Email notifications at critical thresholds
Scaling Timeline:
- 0-2 minutes: High CPU detected, evaluation period
- 2-4 minutes: New container launching
- 4-6 minutes: Container healthy, receiving traffic
- 6+ minutes: Load distributed, CPU normalizes
Key Learnings
- Target tracking scaling is much smarter than threshold-based scaling
- Cooldown periods prevent rapid scaling that could cause instability
- Email alerts provide peace of mind without constant monitoring
- Load testing endpoints are essential for validating your setup
- ECS Fargate eliminates server management completely
What's Next in This Series?
In this comprehensive series, we've learned how to deploy a complete containerized application from VPC to ECS service using Fargate. We covered:
- VPC Setup with multi-AZ networking
- Security Groups and IAM roles
- ECR Repository for container images
- CodeBuild Pipeline for CI/CD
- Application Load Balancer for traffic distribution
- ECS Service with intelligent auto-scaling
The auto-scaling system feels really robust now. I can throw traffic at it, watch it scale intelligently, and get notified if anything needs attention. Perfect foundation for a production workload!
Complete Day 5 Learning Summary
What we accomplished in Day 5:
Infrastructure Built
- ECS Service with Fargate launch type (256 CPU, 512 MB RAM)
- Task Definition with proper IAM roles and logging
- Auto-Scaling Target (1-5 containers) with target tracking policy
- CloudWatch Alarms for critical CPU and max capacity alerts
- SNS Email Notifications for real-time monitoring
Application Features
- 7 FastAPI Endpoints for comprehensive testing
- Load Testing Capabilities (60s sustained + 10s burst)
- Real-time Monitoring with ECS service status
- Error Handling and health checks
- Structured Logging to CloudWatch
Auto-Scaling Intelligence
- Target Tracking: Maintains 40% CPU utilization
- Smart Cooldowns: 2min scale-out, 5min scale-in
- Proportional Scaling: Responds to load intensity
- Email Alerts: Critical thresholds and capacity warnings
Key Takeaway: We now have a fully automated, scalable containerized application that can handle real-world traffic patterns while maintaining cost efficiency and operational visibility.
💻 About Me
Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories ☕
Top comments (0)