This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Observability in EKS Anywhere

Monitoring, Logging, and Tracing for EKS Anywhere Clusters.

1 - Overview

Overview of observability in EKS Anywhere

Most Kubernetes-conformant observability tools can be used with EKS Anywhere. You can optionally use the EKS Connector to view your EKS Anywhere cluster resources in the Amazon EKS console, reference the Connect to console page for details. EKS Anywhere includes the AWS Distro for Open Telemetry (ADOT) and Prometheus for metrics and tracing as EKS Anywhere Curated Packages. You can use popular tooling such as Fluent Bit for logging, and can track the progress of logging for ADOT on the AWS Observability roadmap . For more information on EKS Anywhere Curated Packages, reference the Package Management Overview .

AWS Integrations

AWS offers comprehensive monitoring, logging, alarming, and dashboard capabilities through services such as Amazon CloudWatch , Amazon Managed Prometheus (AMP) , and Amazon Managed Grafana (AMG) . With CloudWatch, you can take advantage of a highly scalable, AWS-native centralized logging and monitoring solution for EKS Anywhere clusters. With AMP and AMG, you can monitor your containerized applications EKS Anywhere clusters at scale with popular Prometheus and Grafana interfaces.

Resources

  1. Verify EKS Anywhere cluster status
  2. Use the EKS Connector to view EKS Anywhere clusters and resources in the EKS console
  3. Use Fluent Bit and Container Insights to send metrics and logs to CloudWatch
  4. Use ADOT to send metrics to AMP and AMG
  5. Expose metrics for EKS Anywhere components

2 - Verify EKS Anywhere cluster status

Verify the status of EKS Anywhere clusters

Check cluster nodes

To verify the expected number of cluster nodes are present and running, use the kubectl command to show that nodes are Ready.

Worker nodes are named using the cluster name followed by the worker node group name. In the example below, the cluster name is mgmt and the worker node group name is md-0. The other nodes shown in the response are control plane or etcd nodes.

kubectl get nodes 
NAME STATUS ROLES AGE VERSION mgmt-clrt4 Ready control-plane 3d22h v1.27.1-eks-61789d8 mgmt-md-0-5557f7c7bxsjkdg-l2kpt Ready <none> 3d22h v1.27.1-eks-61789d8 

Check cluster machines

To verify that the expected number of cluster machines are present and running, use the kubectl command to show that the machines are Running.

The machine objects are named using the cluster name as a prefix and there should be one created for each node in your cluster. In the example below, the command was run against a management cluster with a single attached workload cluster. When the command is run against a management cluster, all machines for the management cluster and attached workload clusters are shown.

kubectl get machines -A 
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION eksa-system mgmt-clrt4 mgmt mgmt-clrt4 vsphere://421a801c-ac46-f47e-de1f-f070ef990c4d Running 3d22h v1.27.1-eks-1-27-4 eksa-system mgmt-md-0-5557f7c7bxsjkdg-l2kpt mgmt mgmt-md-0-5557f7c7bxsjkdg-l2kpt vsphere://421a4b9b-c457-fc4d-458a-d5092f981c5d Running 3d22h v1.27.1-eks-1-27-4 eksa-system w01-7hzfh w01 w01-7hzfh vsphere://421a642b-f4ef-5764-47f9-5b56efcf8a4b Running 15h v1.27.1-eks-1-27-4 eksa-system w01-etcd-z2ggk w01 vsphere://421ac003-3a1a-7dd9-ac83-bd0c75370cc4 Running 15h eksa-system w01-md-0-799ffd7946x5gz8w-p94mt w01 w01-md-0-799ffd7946x5gz8w-p94mt vsphere://421a7b77-ca57-dc78-18bf-f361081a2c5e Running 15h v1.27.1-eks-1-27-4 

Check cluster components

To verify cluster components are present and running, use the kubectl command to show that the system Pods are Running. The number of Pods may vary based on the infrastructure provider (vSphere, bare metal, Snow, Nutanix, CloudStack), and whether the cluster is a workload cluster or a management cluster.

kubectl get pods -A 
NAMESPACE NAME READY STATUS RESTARTS AGE capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t 1/1 Running 0 3d22h capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627 1/1 Running 0 3d22h capi-system capi-controller-manager-88bdd56b4-wnk66 1/1 Running 0 3d22h capv-system capv-controller-manager-644d9864dc-hbrcz 1/1 Running 1 (16h ago) 3d22h cert-manager cert-manager-548579646f-4tgb2 1/1 Running 0 3d22h cert-manager cert-manager-cainjector-cbb6df554-w5fjx 1/1 Running 0 3d22h cert-manager cert-manager-webhook-54f748c89b-qnfr2 1/1 Running 0 3d22h eksa-packages ecr-credential-provider-package-4c7mk 1/1 Running 0 3d22h eksa-packages ecr-credential-provider-package-nvlkb 1/1 Running 0 3d22h eksa-packages eks-anywhere-packages-784c6fc8b9-2t5nr 1/1 Running 0 3d22h eksa-system eksa-controller-manager-76f484bd5b-x6qld 1/1 Running 0 3d22h etcdadm-bootstrap-provider-system etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8 1/1 Running 0 3d22h etcdadm-controller-system etcdadm-controller-controller-manager-6f96f5d594-kqnfw 1/1 Running 0 3d22h kube-system cilium-lbqdt 1/1 Running 0 3d22h kube-system cilium-operator-55c4778776-jvrnh 1/1 Running 0 3d22h kube-system cilium-operator-55c4778776-wjjrk 1/1 Running 0 3d22h kube-system cilium-psqm2 1/1 Running 0 3d22h kube-system coredns-69797695c4-kdtjc 1/1 Running 0 3d22h kube-system coredns-69797695c4-r25vv 1/1 Running 0 3d22h kube-system etcd-mgmt-clrt4 1/1 Running 0 3d22h kube-system kube-apiserver-mgmt-clrt4 1/1 Running 0 3d22h kube-system kube-controller-manager-mgmt-clrt4 1/1 Running 0 3d22h kube-system kube-proxy-588gj 1/1 Running 0 3d22h kube-system kube-proxy-hrksw 1/1 Running 0 3d22h kube-system kube-scheduler-mgmt-clrt4 1/1 Running 0 3d22h kube-system kube-vip-mgmt-clrt4 1/1 Running 0 3d22h kube-system vsphere-cloud-controller-manager-7vzjx 1/1 Running 0 3d22h kube-system vsphere-cloud-controller-manager-cqfs5 1/1 Running 0 3d22h 

Check control plane components

You can verify the control plane is present and running by filtering Pods by the control-plane=controller-manager label.

kubectl get pod -A -l control-plane=controller-manager 
NAMESPACE NAME READY STATUS RESTARTS AGE capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t 1/1 Running 0 3d21h capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627 1/1 Running 0 3d21h capi-system capi-controller-manager-88bdd56b4-wnk66 1/1 Running 0 3d21h capv-system capv-controller-manager-644d9864dc-hbrcz 1/1 Running 1 (15h ago) 3d21h eksa-packages eks-anywhere-packages-784c6fc8b9-2t5nr 1/1 Running 0 3d21h etcdadm-bootstrap-provider-system etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8 1/1 Running 0 3d21h etcdadm-controller-system etcdadm-controller-controller-manager-6f96f5d594-kqnfw 1/1 Running 0 3d21h 

Check workload clusters from management clusters

Set up CLUSTER_NAME and KUBECONFIG environment variable for the management cluster:

export CLUSTER_NAME=mgmt export KUBECONFIG=${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig 
Check control plane resources for all clusters

Use the command below to check the status of cluster control plane resources. This is useful to verify clusters with multiple control plane nodes after an upgrade. The status for the management cluster and all attached workload clusters is shown.

kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io -n eksa-system 
NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION mgmt mgmt true true 1 1 1 3d22h v1.27.1-eks-1-27-4 w01 w01 true true 1 1 1 0 16h v1.27.1-eks-1-27-4 

Use the command below to check the status of a cluster resource. This is useful to verify cluster health after any mutating cluster lifecycle operation. The status for the management cluster and all attached workload clusters is shown.

kubectl get clusters.cluster.x-k8s.io -A -o=custom-columns=NAME:.metadata.name,CONTROLPLANE-READY:.status.controlPlaneReady,INFRASTRUCTURE-READY:.status.infrastructureReady,MANAGED-EXTERNAL-ETCD-INITIALIZED:.status.managedExternalEtcdInitialized,MANAGED-EXTERNAL-ETCD-READY:.status.managedExternalEtcdReady 
NAME CONTROLPLANE-READY INFRASTRUCTURE-READY MANAGED-EXTERNAL-ETCD-INITIALIZED MANAGED-EXTERNAL-ETCD-READY mgmt true true <none> <none> w01 true true true true 

3 - Connect EKS Anywhere clusters to the EKS console

Connect an EKS Anywhere cluster to the EKS console

The EKS Connector lets you connect your EKS Anywhere cluster to the EKS console. The connected console displays the EKS Anywhere cluster, its configuration, workloads, and their status. EKS Connector is a software agent that runs on your EKS Anywhere cluster and registers the cluster with the EKS console

Visit the EKS Connector documentation for details on how to configure and run the EKS Connector.

4 - Configure Fluent Bit for CloudWatch

Using Fluent Bit for logging with EKS Anywhere clusters and CloudWatch

Fluent Bit is an open source, multi-platform log processor and forwarder which allows you to collect data/logs from different sources, then unify and send them to multiple destinations. It’s fully compatible with Docker and Kubernetes environments. Due to its lightweight nature, using Fluent Bit as the log forwarder for EKS Anywhere clusters enables you to stream application logs into Amazon CloudWatch Logs efficiently and reliably.

You can additionally use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices running on EKS Anywhere clusters. CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects.

On this page, we show how to set up Fluent Bit and Container Insights to send logs and metrics from your EKS Anywhere clusters to CloudWatch.

Prerequisites

  • An AWS Account (see AWS documentation to get started)
  • An EKS Anywhere cluster with IAM Roles for Service Account (IRSA) enabled: With IRSA, an IAM role can be associated with a Kubernetes service account. This service account can provide AWS permissions to the containers in any Pod that use the service account, which enables the containers to securely communicate with AWS services. This removes the need to hardcode AWS security credentials as environment variables on your nodes. See the IRSA configuration page for details.

Before setting up Fluent Bit, first create an IAM Policy and Role to send logs to CloudWatch.

Step 1: Create IAM Policy

  1. Go to IAM Policy in the AWS console.

  2. Click on JSON as shown below:

    Observability Create Policy

  3. Create below policy on the IAM Console. Click on Create Policy as shown:

 { "Version": "2012-10-17", "Statement": [ { "Sid": "EKSAnywhereLogging", "Effect": "Allow", "Action": "cloudwatch:*", "Resource": "*" } ] } 

Step 2: Create IAM Role

  1. Go to IAM Role in the AWS console.

  2. Follow the steps as shown below:

    Observability Role Creation

    In Identity Provider, enter the OIDC provider you created as a part of IRSA configuration.

    In Audience, select sts.amazonaws.com. Click on Next.

  3. Select permission name which we have created in Create IAM Policy

    Observability Select Permission

  4. Provide a Role name EKSAnywhereLogging and click Next.

  5. Copy the ARN as shown below and save it locally for the next step.

    Observability Copy ARN

Step 3: Install Fluent Bit

  1. Create the amazon-cloudwatch namespace using this command:

    kubectl create namespace amazon-cloudwatch 
  2. Create the Service Account for cloudwatch-agent and fluent-bit under the amazon-cloudwatch namespace. In this section, we will use Role ARN which we saved earlier . Replace $RoleARN with your actual value.

    cat << EOF | kubectl apply -f - # create cwagent service account and role binding apiVersion: v1 kind: ServiceAccount metadata: name: cloudwatch-agent namespace: amazon-cloudwatch annotations: # set this with value of OIDC_IAM_ROLE eks.amazonaws.com/role-arn: "$RoleARN" # optional: Defaults to "sts.amazonaws.com" if not set eks.amazonaws.com/audience: "sts.amazonaws.com" # optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var # to containers eks.amazonaws.com/sts-regional-endpoints: "true" # optional: Defaults to 86400 for expirationSeconds if not set # Note: This value can be overwritten if specified in the pod # annotation as shown in the next step. eks.amazonaws.com/token-expiration: "86400" --- apiVersion: v1 kind: ServiceAccount metadata: name: fluent-bit namespace: amazon-cloudwatch annotations: # set this with value of OIDC_IAM_ROLE eks.amazonaws.com/role-arn: "$RoleARN" # optional: Defaults to "sts.amazonaws.com" if not set eks.amazonaws.com/audience: "sts.amazonaws.com" # optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var # to containers eks.amazonaws.com/sts-regional-endpoints: "true" # optional: Defaults to 86400 for expirationSeconds if not set # Note: This value can be overwritten if specified in the pod # annotation as shown in the next step. eks.amazonaws.com/token-expiration: "86400" EOF 

    The above command creates two Service Accounts:

    serviceaccount/cloudwatch-agent created serviceaccount/fluent-bit created 
  3. Now deploy Fluent Bit in your EKS Anywhere cluster to scrape and send logs to CloudWatch:

    kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/fluentbit.yaml" 

    You should see the following output:

    clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role changed clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding changed configmap/cwagentconfig changed daemonset.apps/cloudwatch-agent changed configmap/fluent-bit-cluster-info changed clusterrole.rbac.authorization.k8s.io/fluent-bit-role changed clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-role-binding changed configmap/fluent-bit-config changed daemonset.apps/fluent-bit changed 
  4. You can verify the DaemonSets have been deployed with the following command:

    kubectl -n amazon-cloudwatch get daemonsets 
  • If you are running the EKS connector , you can verify the status of DaemonSets by logging into AWS console and navigate to Amazon EKS -> Cluster -> Resources -> DaemonSets

    Observability Verify DaemonSet

Step 4: Deploy a test application

Deploy a simple test application to verify your setup is working properly.

Step 5: View cluster logs and metrics

Cloudwatch Logs

  1. Open the CloudWatch console . The link opens the console and displays your current available log groups.

  2. Choose the EKS Anywhere clustername that you want to view logs for. The log group name format is /aws/containerinsights/my-EKS-Anywhere-cluster/cluster.

    Observability Container Insights

    Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/application has log source from /var/log/containers.

    Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/dataplane has log source for kubelet.service, kubeproxy.service, and docker.service

  3. To view the deployed test application logs, click on the application LogGroup, and click on Search All

    Observability Container Insights

  4. Type HTTP 1.1 200 in the search box and press enter. You should see logs as shown below:

    Observability Container Insights

Cloudwatch Container Insights

  1. Open the CloudWatch console . The link opens the Container Insights performance Monitoring console and displays a dropdown to select your EKS Clusters.

    Observability Container Insights

For more details on CloudWatch logs, please refer What is Amazon CloudWatch Logs?

5 - Expose metrics for EKS Anywhere components

Expose metrics for EKS Anywhere components

Some Kubernetes system components like kube-controller-manager, kube-scheduler, kube-proxy and etcd (Stacked) expose metrics only on the localhost by default. In order to expose metrics for these components so that other monitoring systems like Prometheus can scrape them, you can deploy a proxy as a Daemonset on the host network of the nodes. The proxy pods also need to be configured with control plane tolerations so that they can be scheduled on the control plane nodes.

For etcd metrics, the steps outlined below are applicable only for stacked etcd setup. For Unstacked/External etcd, metrics are already exposed on https://<etcd-machine-ip>:2379/metrics endpoint and can be scraped by Prometheus directly without deploying a proxy.

Configure Proxy

To configure a proxy for exposing metrics on an EKS Anywhere cluster, you can perform the following steps:

  1. Create a config map to store the proxy configuration.

    Below is an example ConfigMap if you use HAProxy as the proxy server.

    cat << EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata:  name: metrics-proxy data:  haproxy.cfg: |  defaults  mode http  timeout connect 5000ms  timeout client 5000ms  timeout server 5000ms  default-server maxconn 10   frontend kube-proxy  bind \${NODE_IP}:10249  http-request deny if !{ path /metrics }  default_backend kube-proxy  backend kube-proxy  server kube-proxy 127.0.0.1:10249 check   frontend kube-controller-manager  bind \${NODE_IP}:10257  http-request deny if !{ path /metrics }  default_backend kube-controller-manager  backend kube-controller-manager  server kube-controller-manager 127.0.0.1:10257 ssl verify none check   frontend kube-scheduler  bind \${NODE_IP}:10259  http-request deny if !{ path /metrics }  default_backend kube-scheduler  backend kube-scheduler  server kube-scheduler 127.0.0.1:10259 ssl verify none check   frontend etcd  bind \${NODE_IP}:2381  http-request deny if !{ path /metrics }  default_backend etcd  backend etcd  server etcd 127.0.0.1:2381 check EOF 
  2. Create a daemonset for the proxy and mount the config map volume onto the proxy pods.

    Below is an example configuration for the HAProxy daemonset.

    cat << EOF | kubectl apply -f - apiVersion: apps/v1 kind: DaemonSet metadata:  name: metrics-proxy spec:  selector:  matchLabels:  app: metrics-proxy  template:  metadata:  labels:  app: metrics-proxy  spec:  tolerations:  - key: node-role.kubernetes.io/control-plane  operator: Exists  effect: NoSchedule  hostNetwork: true  containers:  - name: haproxy  image: public.ecr.aws/eks-anywhere/kubernetes-sigs/kind/haproxy:v0.20.0-eks-a-54  env:  - name: NODE_IP  valueFrom:  fieldRef:  apiVersion: v1  fieldPath: status.hostIP  ports:  - name: kube-proxy  containerPort: 10249  - name: kube-ctrl-mgr  containerPort: 10257  - name: kube-scheduler  containerPort: 10259  - name: etcd  containerPort: 2381  volumeMounts:  - mountPath: "/usr/local/etc/haproxy"  name: haproxy-config  volumes:  - configMap:  name: metrics-proxy  name: haproxy-config EOF 

Configure Client Permissions

  1. Create a new cluster role for the client to access the metrics endpoint of the components.

    cat << EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:  name: metrics-reader rules:  - nonResourceURLs:  - "/metrics"  verbs:  - get EOF 
  2. Create a new cluster role binding to bind the above cluster role to the client pod’s service account.

    cat << EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:  name: metrics-reader-binding subjects: - kind: ServiceAccount  name: default  namespace: default roleRef:  kind: ClusterRole  name: metrics-reader  apiGroup: rbac.authorization.k8s.io EOF 
  3. Verify that the metrics are exposed to the client pods by running the following commands:

    cat << EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata:  name: test-pod spec:  tolerations:  - key: node-role.kubernetes.io/control-plane  operator: Exists  effect: NoSchedule  containers:  - command:  - /bin/sleep  - infinity  image: curlimages/curl:latest  name: test-container  env:  - name: NODE_IP  valueFrom:  fieldRef:  apiVersion: v1  fieldPath: status.hostIP EOF 
    kubectl exec -it test-pod -- sh export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10257/metrics" curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10259/metrics" curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10249/metrics" curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:2381/metrics"