DEV Community

Cover image for AWS VPC to ECS - Day 5: ECS Service with Smart Auto-Scaling
Utkarsh Rastogi for AWS Community Builders

Posted on • Edited on

AWS VPC to ECS - Day 5: ECS Service with Smart Auto-Scaling

Hey everyone! Today I'm diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it's time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.

What We're Building Today

  • ECS Service that keeps containers running and healthy
  • Smart Auto-Scaling that maintains optimal performance (1-5 containers)
  • FastAPI Application with multiple endpoints for testing
  • CloudWatch Monitoring with email alerts
  • Load Testing Endpoints to validate scaling behavior

The Complete ECS Service with Auto-Scaling

Note: We already created the ECS cluster in our previous setup, so we'll focus on the service configuration.

Here's the full ECS service configuration with intelligent auto-scaling:

# infra/ecs/ecs_service.yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'Creating ECS Service for Learning Purpose' Parameters: TeamNameValue: Type: String Description: TeamName Tag Value Default: "awslearner" EnvironmentValue: Type: String Description: Environment Tag Value Default: "dev" ServiceName: Type: String Description: Name of the ECS service Default: "learner-svc" ExecutionRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-role" TaskRoleName: Type: String Description: Name of the service execution role Default: "learner-ecs-task-exc-role" ImageARN: Type: AWS::SSM::Parameter::Value<String> Description: Image ARN Default: "/learner/imagearn/value" ECSCluster: Type: String Description: ECS Cluster Name Default: "learner-cluster" PublicSubnetIds: Type: AWS::SSM::Parameter::Value<String> Description: Subnet ID Default: "/learner/public/subnetids" SecurityGroup: Type: AWS::SSM::Parameter::Value<String> Description: Security Group Default: "/learner/public/sgid" TargetGroupArn: Type: AWS::SSM::Parameter::Value<String> Description: Target Group ARN Default: "/learner/target/value" AlertEmail: Type: String Description: Email address for alerts Default: <Provide Email Address> Resources: # Storage for your container logs ECSLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub /ecs/${ServiceName}-logs RetentionInDays: 14 Tags: - Key: Name Value: !Sub /ecs/${ServiceName}-logs - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Blueprint that tells ECS how to run your containers TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Sub ${ServiceName}-task Cpu: 256 Memory: 512 NetworkMode: awsvpc RequiresCompatibilities: - FARGATE ExecutionRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${ExecutionRoleName} TaskRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${TaskRoleName} ContainerDefinitions: - Name: !Sub ${ServiceName}-container Image: !Ref ImageARN PortMappings: - ContainerPort: 80 LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref ECSLogGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: ecs Tags: - Key: Name Value: !Sub ${ServiceName}-task - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Service that keeps your containers running and healthy ECSService: Type: AWS::ECS::Service Properties: Cluster: !Ref ECSCluster ServiceName: !Sub ${ServiceName}-service TaskDefinition: !Ref TaskDefinition LaunchType: FARGATE DesiredCount: 1 PropagateTags: SERVICE NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: ENABLED Subnets: !Split - "," - !Ref PublicSubnetIds SecurityGroups: - !Ref SecurityGroup LoadBalancers: - ContainerName: !Sub ${ServiceName}-container ContainerPort: 80 TargetGroupArn: !Ref TargetGroupArn Tags: - Key: Name Value: !Sub ${ServiceName}-service - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Defines scaling limits for your containers (1-5 tasks) AutoScalingTarget: Type: AWS::ApplicationAutoScaling::ScalableTarget DependsOn: ECSService Properties: ServiceNamespace: ecs ResourceId: !Sub "service/${ECSCluster}/${ServiceName}-service" ScalableDimension: ecs:service:DesiredCount MinCapacity: 1 MaxCapacity: 5 # Target tracking scaling policy - maintains CPU around 40% AutoScalingPolicy: Type: AWS::ApplicationAutoScaling::ScalingPolicy Properties: PolicyName: !Sub ${EnvironmentValue}-${ServiceName}-target-tracking PolicyType: TargetTrackingScaling ScalingTargetId: !Ref AutoScalingTarget TargetTrackingScalingPolicyConfiguration: TargetValue: 40.0 # Target 40% CPU utilization PredefinedMetricSpecification: PredefinedMetricType: ECSServiceAverageCPUUtilization ScaleOutCooldown: 120 # Wait 2 minutes before scaling up ScaleInCooldown: 300 # Wait 5 minutes before scaling down DisableScaleIn: false # Email notification system for alerts AlertTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub ${EnvironmentValue}-${ServiceName}-alerts DisplayName: !Sub "${ServiceName} ECS Alerts" Tags: - Key: Name Value: !Sub ${EnvironmentValue}-${ServiceName}-alerts - Key: TeamName Value: !Ref TeamNameValue - Key: Environment Value: !Ref EnvironmentValue # Connects your email to the alert system AlertSubscription: Type: AWS::SNS::Subscription Properties: Protocol: email TopicArn: !Ref AlertTopic Endpoint: !Ref AlertEmail # Alert when CPU is high at maximum capacity CriticalCPUAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-CriticalCPU-AtMaxCapacity AlarmDescription: !Sub "CRITICAL: ${ServiceName} CPU >70% at max capacity (5 tasks)" MetricName: CPUUtilization Namespace: AWS/ECS Statistic: Average Period: 60 EvaluationPeriods: 2 Threshold: 70 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster # Alert when you reach maximum number of containers MaxCapacityAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-MaxCapacity-Reached AlarmDescription: !Sub "WARNING: ${ServiceName} reached maximum capacity (5 tasks)" MetricName: RunningTaskCount Namespace: AWS/ECS Statistic: Maximum Period: 60 EvaluationPeriods: 1 Threshold: 5 ComparisonOperator: GreaterThanOrEqualToThreshold AlarmActions: - !Ref AlertTopic Dimensions: - Name: ServiceName Value: !Sub ${ServiceName}-service - Name: ClusterName Value: !Ref ECSCluster 
Enter fullscreen mode Exit fullscreen mode

FastAPI Application with Testing Endpoints

Here's my FastAPI application with endpoints designed to test different scenarios:

# source/app.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel import logging import time import hashlib import boto3 import threading import multiprocessing import uvicorn app = FastAPI() logger = logging.getLogger("ecs_service") active_requests = 0 # Initialize ECS client for monitoring try: ecs_client = boto3.client('ecs') except Exception as e: logger.warning(f"Could not initialize ECS client: {e}") ecs_client = None class SubmitData(BaseModel): name: str = "User" @app.get("/") def home(): """Simple welcome message""" return "Hello from ECS Fargate Service!" @app.get("/api/health") def health(): """Health check endpoint for load balancer""" return {"status": "healthy"} @app.post("/api/submit") def api_submit(data: SubmitData): """Accepts user data and returns personalized message""" logger.info(f"Data received: {data.model_dump()}") return {"message": f"Happy learning, {data.name}!", "data": data.model_dump()} @app.get("/api/load") def generate_load(): """Generates heavy CPU load for 60 seconds to test auto-scaling""" def cpu_intensive_task(): start_time = time.time() while time.time() - start_time < 60: for _ in range(10000): hashlib.sha256(str(time.time()).encode()).hexdigest() sum(range(1000)) # Use multiple threads to maximize CPU usage  cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 2): thread = threading.Thread(target=cpu_intensive_task) thread.start() threads.append(thread) for thread in threads: thread.join() logger.info("CPU load generation completed") return {"status": "load_generated", "duration": "60s", "threads": cpu_count * 2} @app.get("/api/quickload") def quick_load(): """Generates 10-second CPU burst for quick scaling tests""" def burst_task(): start_time = time.time() while time.time() - start_time < 10: for _ in range(50000): hashlib.sha256(str(time.time()).encode()).hexdigest() cpu_count = multiprocessing.cpu_count() threads = [] for _ in range(cpu_count * 3): thread = threading.Thread(target=burst_task) thread.start() threads.append(thread) for thread in threads: thread.join() return {"status": "burst_completed", "duration": "10s"} @app.get("/api/scalinginfo") def get_scaling_info(): """Returns current ECS service scaling status""" if not ecs_client: return {"error": "ECS client not available"} try: response = ecs_client.describe_services( cluster='learner-cluster', services=['learner-svc-service'] ) if response['services']: service = response['services'][0] return { "cluster": "learner-cluster", "service": "learner-svc-service", "desired_count": service['desiredCount'], "running_count": service['runningCount'], "pending_count": service['pendingCount'], "status": service['status'], "active_requests": active_requests } else: return {"error": "Service not found"} except Exception as e: logger.error(f"Error getting scaling info: {str(e)}") return {"error": "Unable to fetch scaling information"} @app.get("/api/error") def trigger_error(): """Triggers 500 error for testing error handling""" logger.error("Intentional error triggered") raise HTTPException(status_code=500, detail="Internal server error") @app.get("/api/notfound") def not_found(): """Triggers 404 error for testing not found responses""" logger.warning("Resource not found") raise HTTPException(status_code=404, detail="Resource not found") if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000) 
Enter fullscreen mode Exit fullscreen mode

Requirements File

# source/requirements.txt fastapi==0.104.1 uvicorn==0.24.0 boto3==1.34.0 pydantic==2.5.0 
Enter fullscreen mode Exit fullscreen mode

Deployment Commands

Important: Before deploying the ECS service, we need to build and push our container image using the CodeBuild project we set up in Day 4.

Deploy in this order:

# 1. First, build and push your container image # This uses the CodeBuild project from Day 4 aws codebuild start-build --project-name learner-project # Wait for the build to complete (check in AWS Console or CLI) # This typically takes 2-3 minutes # 2. Deploy ECS service with auto-scaling aws cloudformation deploy \ --template-file infra/ecs/ecs_service.yaml \ --stack-name AWSLearner-ECS-Stack \ --capabilities CAPABILITY_NAMED_IAM 
Enter fullscreen mode Exit fullscreen mode

Auto-Scaling Scenario's

1. Check Current Status

curl http://your-alb-url/api/scalinginfo 
Enter fullscreen mode Exit fullscreen mode

Response:

{ "desired_count": 1, "running_count": 1, "pending_count": 0, "status": "ACTIVE" } 
Enter fullscreen mode Exit fullscreen mode

2. Trigger Heavy Load

curl http://your-alb-url/api/load 
Enter fullscreen mode Exit fullscreen mode

This creates 60 seconds of intense CPU load. Watch CloudWatch metrics to see:

  • CPU utilization spike to 80-90%
  • Auto-scaling trigger after 2 minutes
  • New containers start (desired_count increases)
  • Load distributes across containers
  • CPU drops back to target 40%

3. Quick Burst Test

curl http://your-alb-url/api/quickload 
Enter fullscreen mode Exit fullscreen mode

Perfect for testing rapid scaling response with a 10-second burst.


Postman Testing

  • GET /api/health - Health check status

health


  • POST /api/submit - Data submission with JSON body

submit


  • GET /api/scalinginfo - Current container status

scalinginfo


  • GET /api/load - Load generation response

apiload


  • GET /api/quickload - Quick burst response

quickload


  • GET /api/error - Error handling test

error


  • GET /api/notfound - 404 error test

notfound


Note: To see auto-scaling in action, you'll need to hit the load endpoints multiple times or use multiple browser tabs/terminals simultaneously to generate enough traffic that triggers the CPU threshold.

What Each Endpoint Does

Endpoint Purpose Response Time
/api/health Load balancer health check Instant
/api/submit Data processing test Instant
/api/load Sustained CPU load (60s) 60 seconds
/api/quickload CPU burst (10s) 10 seconds
/api/scalinginfo Current container status Instant
/api/error Error handling test Instant
/api/notfound 404 error test Instant

Auto-Scaling Behavior

How it works:

  • Target: Maintains 40% CPU utilization
  • Scale Out: Adds containers when CPU > 40% (2-minute cooldown)
  • Scale In: Removes containers when CPU < 40% (5-minute cooldown)
  • Limits: 1-5 containers
  • Alerts: Email notifications at critical thresholds

Scaling Timeline:

  1. 0-2 minutes: High CPU detected, evaluation period
  2. 2-4 minutes: New container launching
  3. 4-6 minutes: Container healthy, receiving traffic
  4. 6+ minutes: Load distributed, CPU normalizes

Key Learnings

  1. Target tracking scaling is much smarter than threshold-based scaling
  2. Cooldown periods prevent rapid scaling that could cause instability
  3. Email alerts provide peace of mind without constant monitoring
  4. Load testing endpoints are essential for validating your setup
  5. ECS Fargate eliminates server management completely

What's Next in This Series?

In this comprehensive series, we've learned how to deploy a complete containerized application from VPC to ECS service using Fargate. We covered:

  • VPC Setup with multi-AZ networking
  • Security Groups and IAM roles
  • ECR Repository for container images
  • CodeBuild Pipeline for CI/CD
  • Application Load Balancer for traffic distribution
  • ECS Service with intelligent auto-scaling

The auto-scaling system feels really robust now. I can throw traffic at it, watch it scale intelligently, and get notified if anything needs attention. Perfect foundation for a production workload!


Complete Day 5 Learning Summary

What we accomplished in Day 5:

Infrastructure Built

  • ECS Service with Fargate launch type (256 CPU, 512 MB RAM)
  • Task Definition with proper IAM roles and logging
  • Auto-Scaling Target (1-5 containers) with target tracking policy
  • CloudWatch Alarms for critical CPU and max capacity alerts
  • SNS Email Notifications for real-time monitoring

Application Features

  • 7 FastAPI Endpoints for comprehensive testing
  • Load Testing Capabilities (60s sustained + 10s burst)
  • Real-time Monitoring with ECS service status
  • Error Handling and health checks
  • Structured Logging to CloudWatch

Auto-Scaling Intelligence

  • Target Tracking: Maintains 40% CPU utilization
  • Smart Cooldowns: 2min scale-out, 5min scale-in
  • Proportional Scaling: Responds to load intensity
  • Email Alerts: Critical thresholds and capacity warnings

Key Takeaway: We now have a fully automated, scalable containerized application that can handle real-world traffic patterns while maintaining cost efficiency and operational visibility.


💻 About Me

Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories

👉 Explore more


Top comments (0)