Utkarsh Rastogi for AWS Community Builders

Posted on Aug 23 • Edited on Aug 24

AWS VPC to ECS - Day 5: ECS Service with Smart Auto-Scaling

Hey everyone! Today I'm diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it's time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.

What We're Building Today

ECS Service that keeps containers running and healthy
Smart Auto-Scaling that maintains optimal performance (1-5 containers)
FastAPI Application with multiple endpoints for testing
CloudWatch Monitoring with email alerts
Load Testing Endpoints to validate scaling behavior

The Complete ECS Service with Auto-Scaling

Note: We already created the ECS cluster in our previous setup, so we'll focus on the service configuration.

Here's the full ECS service configuration with intelligent auto-scaling:

# infra/ecs/ecs_service.yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'Creating ECS Service for Learning Purpose' Parameters: TeamNameValue: Type: String Description: TeamName Tag Value Default: "awslearner" EnvironmentValue: Type: String Description: Environment Tag Value Default: "dev" ServiceName: Type: String Description: Name of the ECS service Default: "learner-svc" ExecutionRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-role" TaskRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-task-exc-role" ImageARN: Type: AWS::SSM::Parameter::Value<String> Description: Image ARN Default: "/learner/imagearn/value" ECSCluster: Type: String Description: ECS Cluster Name Default: "learner-cluster" PublicSubnetIds: Type: AWS::SSM::Parameter::Value<String> Description: Subnet ID Default: "/learner/public/subnetids" SecurityGroup: Type: AWS::SSM::Parameter::Value<String> Description: Security Group Default: "/learner/public/sgid" TargetGroupArn: Type: AWS::SSM::Parameter::Value<String> Description: Target Group ARN Default: "/learner/target/value" AlertEmail: Type: String Description: Email address for alerts Default: <Provide Email Address> Resources: # Storage for your container logs ECSLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub /ecs/${ServiceName}-logs RetentionInDays: 14 Tags: - Key: Name Value: !Sub /ecs/${ServiceName}-logs - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Blueprint that tells ECS how to run your containers TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Sub ${ServiceName}-task Cpu: 256 Memory: 512 NetworkMode: awsvpc RequiresCompatibilities: - FARGATE ExecutionRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${ExecutionRoleName} TaskRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${TaskRoleName} ContainerDefinitions: - Name: !Sub ${ServiceName}-container Image: !Ref ImageARN PortMappings: - ContainerPort: 80 LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref ECSLogGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: ecs Tags: - Key: Name Value: !Sub ${ServiceName}-task - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Service that keeps your containers running and healthy ECSService: Type: AWS::ECS::Service Properties: Cluster: !Ref ECSCluster ServiceName: !Sub ${ServiceName}-service TaskDefinition: !Ref TaskDefinition LaunchType: FARGATE DesiredCount: 1 PropagateTags: SERVICE NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: ENABLED Subnets: !Split - "," - !Ref PublicSubnetIds SecurityGroups: - !Ref SecurityGroup LoadBalancers: - ContainerName: !Sub ${ServiceName}-container ContainerPort: 80 TargetGroupArn: !Ref TargetGroupArn Tags: - Key: Name Value: !Sub ${ServiceName}-service - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Defines scaling limits for your containers (1-5 tasks) AutoScalingTarget: Type: AWS::ApplicationAutoScaling::ScalableTarget DependsOn: ECSService Properties: ServiceNamespace: ecs ResourceId: !Sub "service/${ECSCluster}/${ServiceName}-service" ScalableDimension: ecs:service:DesiredCount MinCapacity: 1 MaxCapacity: 5 # Target tracking scaling policy - maintains CPU around 40% AutoScalingPolicy: Type: AWS::ApplicationAutoScaling::ScalingPolicy Properties: PolicyName: !Sub ${EnvironmentValue}-${ServiceName}-target-tracking PolicyType: TargetTrackingScaling ScalingTargetId: !Ref AutoScalingTarget TargetTrackingScalingPolicyConfiguration: TargetValue: 40.0 # Target 40% CPU utilization PredefinedMetricSpecification: PredefinedMetricType: ECSServiceAverageCPUUtilization ScaleOutCooldown: 120 # Wait 2 minutes before scaling up ScaleInCooldown: 300 # Wait 5 minutes before scaling down DisableScaleIn: false # Email notification system for alerts AlertTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub ${EnvironmentValue}-${ServiceName}-alerts DisplayName: !Sub "${ServiceName} ECS Alerts" Tags: - Key: Name Value: !Sub ${EnvironmentValue}-${ServiceName}-alerts - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Connects your email to the alert system AlertSubscription: Type: AWS::SNS::Subscription Properties: Protocol: email TopicArn: !Ref AlertTopic Endpoint: !Ref AlertEmail # Alert when CPU is high at maximum capacity CriticalCPUAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-CriticalCPU-AtMaxCapacity AlarmDescription: !Sub "CRITICAL: ${ServiceName} CPU >70% at max capacity (5 tasks)" MetricName: CPUUtilization Namespace: AWS/ECS Statistic: Average Period: 60 EvaluationPeriods: 2 Threshold: 70 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster # Alert when you reach maximum number of containers MaxCapacityAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-MaxCapacity-Reached AlarmDescription: !Sub "WARNING: ${ServiceName} reached maximum capacity (5 tasks)" MetricName: RunningTaskCount Namespace: AWS/ECS Statistic: Maximum Period: 60 EvaluationPeriods: 1 Threshold: 5 ComparisonOperator: GreaterThanOrEqualToThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster

FastAPI Application with Testing Endpoints

Here's my FastAPI application with endpoints designed to test different scenarios:

# source/app.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel import logging import time import hashlib import boto3 import threading import multiprocessing import uvicorn app = FastAPI() logger = logging.getLogger("ecs_service") active_requests = 0 # Initialize ECS client for monitoring try: ecs_client = boto3.client('ecs') except Exception as e: logger.warning(f"Could not initialize ECS client: {e}") ecs_client = None class SubmitData(BaseModel): name: str = "User" @app.get("/") def home(): """Simple welcome message""" return "Hello from ECS Fargate Service!" @app.get("/api/health") def health(): """Health check endpoint for load balancer""" return {"status": "healthy"} @app.post("/api/submit") def api_submit(data: SubmitData): """Accepts user data and returns personalized message""" logger.info(f"Data received: {data.model_dump()}") return {"message": f"Happy learning, {data.name}!", "data": data.model_dump()} @app.get("/api/load") def generate_load(): """Generates heavy CPU load for 60 seconds to test auto-scaling""" def cpu_intensive_task(): start_time = time.time() while time.time() - start_time < 60: for _ in range(10000): hashlib.sha256(str(time.time()).encode()).hexdigest() sum(range(1000)) # Use multiple threads to maximize CPU usage  cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 2): thread = threading.Thread(target=cpu_intensive_task) thread.start() threads.append(thread) for thread in threads: thread.join() logger.info("CPU load generation completed") return {"status": "load_generated", "duration": "60s", "threads": cpu_count * 2} @app.get("/api/quickload") def quick_load(): """Generates 10-second CPU burst for quick scaling tests""" def burst_task(): start_time = time.time() while time.time() - start_time < 10: for _ in range(50000): hashlib.sha256(str(time.time()).encode()).hexdigest() cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 3): thread = threading.Thread(target=burst_task) thread.start() threads.append(thread) for thread in threads: thread.join() return {"status": "burst_completed", "duration": "10s"} @app.get("/api/scalinginfo") def get_scaling_info(): """Returns current ECS service scaling status""" if not ecs_client: return {"error": "ECS client not available"} try: response = ecs_client.describe_services( cluster='learner-cluster', services=['learner-svc-service'] ) if response['services']: service = response['services'][0] return { "cluster": "learner-cluster", "service": "learner-svc-service", "desired_count": service['desiredCount'], "running_count": service['runningCount'], "pending_count": service['pendingCount'], "status": service['status'], "active_requests": active_requests } else: return {"error": "Service not found"} except Exception as e: logger.error(f"Error getting scaling info: {str(e)}") return {"error": "Unable to fetch scaling information"} @app.get("/api/error") def trigger_error(): """Triggers 500 error for testing error handling""" logger.error("Intentional error triggered") raise HTTPException(status_code=500, detail="Internal server error") @app.get("/api/notfound") def not_found(): """Triggers 404 error for testing not found responses""" logger.warning("Resource not found") raise HTTPException(status_code=404, detail="Resource not found") if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)

Requirements File

# source/requirements.txt fastapi==0.104.1 uvicorn==0.24.0 boto3==1.34.0 pydantic==2.5.0

Deployment Commands

Important: Before deploying the ECS service, we need to build and push our container image using the CodeBuild project we set up in Day 4.

Deploy in this order:

# 1. First, build and push your container image # This uses the CodeBuild project from Day 4 aws codebuild start-build --project-name learner-project # Wait for the build to complete (check in AWS Console or CLI) # This typically takes 2-3 minutes # 2. Deploy ECS service with auto-scaling aws cloudformation deploy \ --template-file infra/ecs/ecs_service.yaml \ --stack-name AWSLearner-ECS-Stack \ --capabilities CAPABILITY_NAMED_IAM

Auto-Scaling Scenario's

1. Check Current Status

curl http://your-alb-url/api/scalinginfo

Response:

{ "desired_count": 1, "running_count": 1, "pending_count": 0, "status": "ACTIVE" }

2. Trigger Heavy Load

curl http://your-alb-url/api/load

This creates 60 seconds of intense CPU load. Watch CloudWatch metrics to see:

CPU utilization spike to 80-90%
Auto-scaling trigger after 2 minutes
New containers start (desired_count increases)
Load distributes across containers
CPU drops back to target 40%

3. Quick Burst Test

curl http://your-alb-url/api/quickload

Perfect for testing rapid scaling response with a 10-second burst.

Postman Testing

GET /api/health - Health check status

POST /api/submit - Data submission with JSON body

GET /api/scalinginfo - Current container status

GET /api/load - Load generation response

GET /api/quickload - Quick burst response

GET /api/error - Error handling test

GET /api/notfound - 404 error test

Note: To see auto-scaling in action, you'll need to hit the load endpoints multiple times or use multiple browser tabs/terminals simultaneously to generate enough traffic that triggers the CPU threshold.

What Each Endpoint Does

Endpoint	Purpose	Response Time
`/api/health`	Load balancer health check	Instant
`/api/submit`	Data processing test	Instant
`/api/load`	Sustained CPU load (60s)	60 seconds
`/api/quickload`	CPU burst (10s)	10 seconds
`/api/scalinginfo`	Current container status	Instant
`/api/error`	Error handling test	Instant
`/api/notfound`	404 error test	Instant

Auto-Scaling Behavior

How it works:

Target: Maintains 40% CPU utilization
Scale Out: Adds containers when CPU > 40% (2-minute cooldown)
Scale In: Removes containers when CPU < 40% (5-minute cooldown)
Limits: 1-5 containers
Alerts: Email notifications at critical thresholds

Scaling Timeline:

0-2 minutes: High CPU detected, evaluation period
2-4 minutes: New container launching
4-6 minutes: Container healthy, receiving traffic
6+ minutes: Load distributed, CPU normalizes

Key Learnings

Target tracking scaling is much smarter than threshold-based scaling
Cooldown periods prevent rapid scaling that could cause instability
Email alerts provide peace of mind without constant monitoring
Load testing endpoints are essential for validating your setup
ECS Fargate eliminates server management completely

What's Next in This Series?

In this comprehensive series, we've learned how to deploy a complete containerized application from VPC to ECS service using Fargate. We covered:

VPC Setup with multi-AZ networking
Security Groups and IAM roles
ECR Repository for container images
CodeBuild Pipeline for CI/CD
Application Load Balancer for traffic distribution
ECS Service with intelligent auto-scaling

The auto-scaling system feels really robust now. I can throw traffic at it, watch it scale intelligently, and get notified if anything needs attention. Perfect foundation for a production workload!

Complete Day 5 Learning Summary

What we accomplished in Day 5:

Infrastructure Built

ECS Service with Fargate launch type (256 CPU, 512 MB RAM)
Task Definition with proper IAM roles and logging
Auto-Scaling Target (1-5 containers) with target tracking policy
CloudWatch Alarms for critical CPU and max capacity alerts
SNS Email Notifications for real-time monitoring

Application Features

7 FastAPI Endpoints for comprehensive testing
Load Testing Capabilities (60s sustained + 10s burst)
Real-time Monitoring with ECS service status
Error Handling and health checks
Structured Logging to CloudWatch

Auto-Scaling Intelligence

Target Tracking: Maintains 40% CPU utilization
Smart Cooldowns: 2min scale-out, 5min scale-in
Proportional Scaling: Responds to load intensity
Email Alerts: Critical thresholds and capacity warnings

Key Takeaway: We now have a fully automated, scalable containerized application that can handle real-world traffic patterns while maintaining cost efficiency and operational visibility.

💻 About Me

Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories ☕

👉 Explore more

DEV Community