AWS Open Source Blog
Using AWS Distro for OpenTelemetry Collector for cross-account metrics collection on Amazon ECS
In November 2020, we announced OpenTelemetry support on AWS with AWS Distro for OpenTelemetry (ADOT), a secure, production-ready, AWS-supported distribution of the Cloud Native Computing Foundation (CNCF) OpenTelemetry project. With ADOT, you can instrument applications to send correlated metrics and traces to multiple AWS solutions, such as our Amazon Managed Service for Prometheus (AMP) and Partner monitoring solutions.
Many customers have their applications running on separate AWS accounts—and even separate AWS Regions—and would like to have a central place for observability. In a previous article, we explained how to collect metrics across multiple accounts with Amazon Elastic Kubernetes Service (Amazon EKS). The scenario will be similar, except, in this one, we use the ADOT agent to collect application and platform metrics for workloads running on Amazon Elastic Container Service (Amazon ECS), our native container orchestration platform to an AMP workspace.
Setup overview
To resolve this challenge, we will use the following structure.
On the workload accounts:
- Create an IAM role to be used by Amazon ECS tasks.
On the central monitoring account:
- Create an AMP workspace.
- Create an IAM role that allows cross-account access to AMP.
On the workload accounts:
- Create Amazon ECS tasks permissions to assume a cross-account IAM role.
- Set up the application and the AWS Distro for OpenTelemetry agent.
- Create an Amazon ECS cluster and run the application.
On the central monitoring account:
- Visualize metrics with Amazon Managed Grafana.
The entire architecture looks like the following:
Workload account: ECS role setup
Logged into the workload account, we create an IAM role that will be used later by Amazon ECS tasks. This role then will be trusted on the central monitoring account and granted assume-role permissions.
cat > task-assume-role.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF aws iam create-role --role-name ecs-xaccount-task-role \ --assume-role-policy-document file://task-assume-role.json \ --region eu-west-1
Monitoring account setup
Logged into the workload account, we create an AMP workspace with the following command with awscli:
aws amp create-workspace --alias ecs-xaccount-metrics-demo --region eu-west-1
Alternatively, we can use the AWS console and navigate to the AMP service.
We now can create an IAM role with write permissions to the AMP workspace. To grant multiple accounts, populate the "AWS"
array with appropriate IAM role ARNs:
WORKLOAD_ACCOUNT_ID= cat > policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::$WORKLOAD_ACCOUNT_ID:role/ecs-xaccount-task-role" ] }, "Action": "sts:AssumeRole", "Condition": {} } ] } EOF # Note: You might encounter an error if the ecs-xaccount-task-role # does not exists in the workload account. aws iam create-role \ --role-name ECS-AMP-Central-Role \ --assume-role-policy-document file://policy.json \ --query 'Role.RoleName' \ --output text aws iam attach-role-policy --role-name ECS-AMP-Central-Role \ --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
Workload account
Note: You can repeat instructions in this section for as many workload accounts as needed.
Logged into the workload account, we grant assumeRole
permissions to the role created previously:
# Set the central account id CENTRAL_ACCOUNT_ID= cat > policy.json <<EOF { "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Action":[ "sts:AssumeRole" ], "Resource":"arn:aws:iam::${CENTRAL_ACCOUNT_ID}:role/ECS-AMP-Central-Role" } ] } EOF POLICY_ARN=$(aws iam create-policy --policy-name xaccount-amp-write \ --policy-document file://policy.json | jq -r '.Policy.Arn') aws iam attach-role-policy --role-name ecs-xaccount-task-role \ --policy-arn $POLICY_ARN
Workload configuration
Next, we set up a sample application that exposes Prometheus metrics:
- Configure the
aws-otel-collector
to scrape the application and ECS metrics. - Build Docker images and host them on Amazon Elastic Container Registry (Amazon ECR).
- Configure, create an Amazon ECS cluster, and run everything using
ecs-cli
.
The layout should be organized as follows:
├── aws-otel-collector │ ├── Dockerfile │ └── config.yaml ├── demo-app │ ├── Dockerfile │ └── main.go ├── docker-compose.yml └── ecs-params.yml
To set up Amazon ECS, we need Docker and ecs-cli as requirements. On Linux, ecs-cli
can be installed like this:
sudo curl -Lo /usr/local/bin/ecs-cli https://amazon-ecs-cli.s3.amazonaws.com/ecs-cli-linux-amd64-latest
Now, let’s create the sample application that exposes a /metrics
Prometheus endpoint:
mkdir demo-app cd demo-app/ cat > main.go <<EOF package main import ( "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" ) func main() { http.Handle("/metrics", promhttp.Handler()) http.ListenAndServe(":8000", nil) } EOF
This will create a Dockerfile for the application:
cat > Dockerfile <<EOF FROM golang:1.18 as builder WORKDIR /go/src/app COPY . . RUN go mod init demo RUN go get . RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app . FROM alpine:latest WORKDIR /app RUN apk --no-cache add ca-certificates COPY --from=builder /go/src/app/app . EXPOSE 8000 CMD ["./app"] EOF
And finally, the following script will create an ECR repository, build the application image, and push the image to Amazon ECR:
APP_REPOSITORY=$(aws ecr create-repository --repository demo-app --query repository.repositoryUri --output text) docker build . -t demo-app aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $APP_REPOSITORY docker tag demo-app:latest $APP_REPOSITORY docker push $APP_REPOSITORY cd -
Now, let’s configure the AWS Distro for OpenTelemetry Collector. We will create a custom configuration to collect data called a Pipeline. A Pipeline defines a path the data follows in the collector starting from reception, then further processing or modification, and finally exiting the collector via exporters.
We will collect from the application with the /metrics
endpoint and make use of the ecs-metrics-receiver
to scrape various ECS task metadata from the ECS task metadata endpoint. Visit the documentation to learn more about ecs-metrics-receiver and other configuration options.
We will export collected metrics to the AMP workspace created on the monitoring account using awsprometheusremotewrite
exporters configuration. We will provide both the AMP remote_write
endpoint and the IAM role to assume—in our case, ECS-AMP-Central-Role
.
Edit the WORKSPACE_ID
and CENTRAL_ACCOUNT_ID
variables and run the following script to create the pipeline:
WORKSPACE_ID= CENTRAL_ACCOUNT_ID= mkdir aws-otel-collector cd aws-otel-collector cat > config.yaml <<EOF receivers: prometheus: config: global: scrape_interval: 15s scrape_timeout: 10s scrape_configs: - job_name: "prometheus-demo-app" static_configs: - targets: [ 0.0.0.0:8000 ] awsecscontainermetrics: collection_interval: 20s processors: filter: metrics: include: match_type: strict metric_names: - ecs.task.memory.utilized - ecs.task.memory.reserved - ecs.task.cpu.utilized - ecs.task.cpu.reserved - ecs.task.network.rate.rx - ecs.task.network.rate.tx - ecs.task.storage.read_bytes - ecs.task.storage.write_bytes exporters: prometheusremotewrite: endpoint: https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write auth: authenticator: sigv4auth logging: loglevel: debug extensions: sigv4auth: service: "aps" assume_role: arn: arn:aws:iam::$CENTRAL_ACCOUNT_ID:role/ECS-AMP-Central-Role sts_region: us-west-2 service: extensions: [sigv4auth] pipelines: metrics: receivers: [prometheus] exporters: [logging, prometheusremotewrite] metrics/ecs: receivers: [awsecscontainermetrics] processors: [filter] exporters: [logging, prometheusremotewrite] EOF
From the latest version of the aws-otel-collector
, create a custom image on Amazon ECR with our custom configuration:
cat > Dockerfile <<EOF FROM public.ecr.aws/aws-observability/aws-otel-collector:latest COPY config.yaml /etc/ecs/otel-config.yaml CMD ["--config=/etc/ecs/otel-config.yaml"] EOF
Finally, build and push the image:
COLLECTOR_REPOSITORY=$(aws ecr create-repository --repository aws-otel-collector --query repository.repositoryUri --output text) docker build . -t aws-otel-collector aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $COLLECTOR_REPOSITORY docker tag aws-otel-collector:latest $COLLECTOR_REPOSITORY docker push $COLLECTOR_REPOSITORY cd -
Run application: Set up Amazon ECS
Amazon ECS needs an execution role
—a set of permissions to run our tasks. Run the following script to create it:
cat > task-execution-assume-role.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF aws iam create-role --role-name ecs-xaccount-task-execution-role \ --assume-role-policy-document file://task-execution-assume-role.json \ --region eu-west-1 aws iam --region eu-west-1 attach-role-policy --role-name ecs-xaccount-task-execution-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Set up the WORKLOAD_ACCOUNT_ID
variable and run the following script to create a docker-compose
file:
WORKLOAD_ACCOUNT_ID= cat > docker-compose.yml <<EOF version: "3" services: aws-otel-collector: image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/aws-otel-collector:latest environment: - AWS_REGION=eu-west-1 logging: driver: awslogs options: awslogs-group: ecs-xaccount-metrics-demo awslogs-region: eu-west-1 awslogs-stream-prefix: aws-otel-collector prometheus-demo-app: image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/demo-app ports: - "8000:8000" depends_on: - aws-otel-collector logging: driver: awslogs options: awslogs-group: ecs-xaccount-metrics-demo awslogs-region: eu-west-1 awslogs-stream-prefix: demo-app EOF
Using ecs-cli
, we will create an Amazon ECS cluster:
ecs-cli configure --cluster ecs-xaccount-metrics-demo \ --default-launch-type FARGATE \ --config-name ecs-xaccount-metrics-demo \ --region eu-west-1 ecs-cli up --cluster-config ecs-xaccount-metrics-demo
After few minutes, the cluster should be created with all necessary associated resources. Select the VPC_ID
from the preceding command and get the default security group associated to the VPC:
VPC_ID= aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC_ID \ --region eu-west-1 \ --query SecurityGroups[0].GroupId \ --output text
Edit the ecs-params.yml
file needed by ecs-cli
, and replace the subnet IDs and security group from the previous outputs:
version: 1 task_definition: ecs_network_mode: awsvpc task_role_arn: ecs-xaccount-task-role task_execution_role: ecs-xaccount-task-execution-role task_size: mem_limit: 0.5GB cpu_limit: 256 run_params: network_configuration: awsvpc_configuration: subnets: - "subnet-" - "subnet-" security_groups: - "sg-" assign_public_ip: ENABLED
Finally, run the following script to deploy the application:
ecs-cli compose --project-name ecs-xaccount-metrics-demo \ service up \ --cluster-config ecs-xaccount-metrics-demo \ --create-log-groups
After few minutes, the Amazon ECS service should be up and running. You can verify the logs of the aws-otel-collector
on the Amazon CloudWatch Logs console, with the log group ecs-xaccount-metrics-demo
.
Monitoring account: Visualize metrics
Back in the monitoring account, let’s visualize our metrics using an Amazon Managed Grafana workspace. Refer to the documentation to set up Amazon Managed Grafana.
We can view metrics coming from the application endpoint:
And the Amazon ECS cluster metrics:
Clean up
Workload account
WORKLOAD_ACCOUNT_ID= # stop and deletes ecs service ecs-cli compose --project-name ecs-xaccount-metrics-demo service down --cluster-config ecs-xaccount-metrics-demo # delete ecs cluster ecs-cli down --cluster-config ecs-xaccount-metrics-demo # delete task role aws iam detach-role-policy --role-name ecs-xaccount-task-role --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write aws iam delete-policy --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write aws iam delete-role --role-name ecs-xaccount-task-role # delete task execution role aws iam detach-role-policy --role-name ecs-xaccount-task-execution-role --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy aws iam delete-role --role-name ecs-xaccount-task-execution-role
Central account
WORKSPACE_ID= # delete role aws iam detach-role-policy --role-name ECS-AMP-Central-Role --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess aws iam delete-role --role-name ECS-AMP-Central-Role # delete workspace aws amp delete-workspace --workspace-id $WORKSPACE_ID
Conclusion
In this post, we explained how to use the AWS Distro for OpenTelemetry (ADOT) agent to collect application and platform metrics for workloads running on Amazon ECS.
You can use ADOT on other platforms, such as Amazon EKS, Amazon Elastic Compute Cloud (Amazon EC2), or on-premises. Additionally, you can use ADOT to collect distributed traces data and have multiple heterogeneous workload accounts sending metrics centrally to AMP and other platforms. Also, you can set up private connectivity with VPC endpoints and VPC peering, according to your needs.
Visit the ADOT, AMP, and Amazon Managed Grafana sites to learn more.