Posted on Aug 15 • Edited on Aug 30

OpenTelemetry in Action on Kubernetes: Part 9 - Cluster-Level Observability with OpenTelemetry Agent + Gateway

#opentelemetry #kubernetes #observability

Welcome to the grand finale of our observability series! So far, we’ve added visibility into our application through logs, metrics, and traces — all flowing beautifully into Grafana via OpenTelemetry Collector.

But there’s still one big puzzle piece left: the Kubernetes cluster itself.

In this final part, we’ll:

Collect host and node-level metrics using hostmetrics
Deploy a centralized Collector in Deployment mode (gateway)
Introduce ServiceAccount for permissions
Collect Kubernetes control plane metrics using k8s_cluster
Use the debug exporter to troubleshoot data pipelines
And finally, conclude the series with a high-level recap

Why Cluster-Level Observability Matters

While we've focused on application telemetry so far, it's just one piece of the puzzle. For full visibility, we must also observe the Kubernetes cluster itself — the infrastructure running our apps.

Cluster observability helps us:

Monitor node health and resource usage
Track control plane performance (API server, scheduler, etc.)
Understand pod scheduling and evictions
Improve scaling decisions
Troubleshoot infrastructure-level issues
Strengthen security and governance

In short, without visibility into the cluster, you're flying blind. This part of the series ensures you're watching not just the app, but the platform beneath it.

Add `hostmetrics` Receiver in the Agent

We’ll start by updating our otel-collector-agent (running as DaemonSet) to use the hostmetrics receiver. This receiver scrapes system-level metrics from each node, such as CPU, memory, disk, filesystem, and load.

Config – otel-collector-agent-configmap.yaml

receivers: hostmetrics: collection_interval: 1m scrapers: cpu: {} memory: {} disk: {} load: {} filesystem: {} network: {} system: {} processors: memory_limiter: check_interval: 1s limit_percentage: 80 spike_limit_percentage: 15 batch: send_batch_size: 1000 timeout: 5s exporters: prometheus: endpoint: "0.0.0.0:8889" enable_open_metrics: true resource_to_telemetry_conversion: enabled: true service: pipelines: # collect metrics from otlp and hostmetrics receiver and expose in prometheus compatible format metrics: receivers: [otlp, hostmetrics] processors: [memory_limiter, batch] exporters: [prometheus]

Each hostmetrics receiver runs inside the agent pod on every node, giving us node-specific insights.

Deploy the OpenTelemetry Gateway

1. Why Deployment Mode?

Deployment Mode is used for centralized collection, aggregation, and export of telemetry data.
Unlike the DaemonSet agent, which runs on each node, a Deployment collector can scrape and process cluster-wide metrics.

2. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding

To use the k8s_cluster receiver, the collector must have permission to access Kubernetes objects like nodes, pods, namespaces, etc.

What is a ServiceAccount in Kubernetes?

A ServiceAccount in Kubernetes is an identity used by pods to authenticate and interact securely with the Kubernetes API. While every pod gets a default ServiceAccount, you often need to create custom ones with specific RBAC (Role-Based Access Control) permissions for security and least privilege.

In our case, the OpenTelemetry Collector needs to read cluster state—like nodes, pods, and namespaces—to collect metrics using the k8s_cluster receiver. So, we create a dedicated ServiceAccount and bind it to a ClusterRole with read-only access to those resources. This ensures our collector can operate properly without over-privileging it.

# otel-collector-gateway-serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: otel-collector-gateway-sa namespace: observability labels: app: otel-collector-gateway --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: otel-collector-gateway-role labels: app: otel-collector-gateway rules: - apiGroups: - "" resources: - events - namespaces - namespaces/status - nodes - nodes/spec - pods - pods/status - replicationcontrollers - replicationcontrollers/status - resourcequotas - services verbs: - get - list - watch - apiGroups: - apps resources: - daemonsets - deployments - replicasets - statefulsets verbs: - get - list - watch - apiGroups: - extensions resources: - daemonsets - deployments - replicasets verbs: - get - list - watch - apiGroups: - batch resources: - jobs - cronjobs verbs: - get - list - watch - apiGroups: - autoscaling resources: - horizontalpodautoscalers verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector-gateway-binding labels: app: otel-collector-gateway subjects: - kind: ServiceAccount name: otel-collector-gateway-sa namespace: observability roleRef: kind: ClusterRole name: otel-collector-gateway-role apiGroup: rbac.authorization.k8s.io

Apply it:

kubectl -n observability apply -f otel-collector-gateway-serviceaccount.yaml

3. OpenTelemetry Collector Config with `k8s_cluster` Receiver

Create the config file as a ConfigMap.

# otel-collector-gateway-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: otel-collector-gateway-config namespace: observability labels: app: otel-collector-gateway data: otel-collector-config.yaml: | receivers: k8s_cluster: auth_type: "serviceAccount" collection_interval: 30s processors: memory_limiter: check_interval: 1s limit_percentage: 80 spike_limit_percentage: 15 batch: send_batch_size: 1000 timeout: 5s exporters: debug: verbosity: detailed prometheus: endpoint: "0.0.0.0:8889" enable_open_metrics: true resource_to_telemetry_conversion: enabled: true service: pipelines: metrics: receivers: [k8s_cluster] processors: [memory_limiter, batch] exporters: [prometheus]

Apply it:

kubectl -n observability apply -f otel-collector-gateway-configmap.yaml

4. Deploy the OpenTelemetry Collector

# otel-collector-gateway-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: otel-collector-gateway namespace: observability labels: app: otel-collector-gateway spec: replicas: 1 revisionHistoryLimit: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 25% # Allow 25% more pods than desired during update maxUnavailable: 25% # Allow 25% of desired pods to be unavailable during update selector: matchLabels: app: otel-collector-gateway template: metadata: labels: app: otel-collector-gateway spec: serviceAccountName: otel-collector-gateway-sa containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:latest args: ["--config=/conf/otel-collector-config.yaml"] volumeMounts: - name: config-volume mountPath: /conf resources: requests: cpu: 10m memory: 32Mi limits: cpu: 50m memory: 128Mi volumes: - name: config-volume configMap: name: otel-collector-gateway-config

Apply it:

kubectl -n observability apply -f otel-collector-gateway-deployment.yaml

5. Expose Collector to Prometheus

# otel-collector-gateway-service.yaml apiVersion: v1 kind: Service metadata: name: otel-collector-gateway namespace: observability labels: app: otel-collector-gateway spec: selector: app: otel-collector-gateway ports: - name: otlp-grpc port: 4317 targetPort: 4317 protocol: TCP - name: otlp-http port: 4318 targetPort: 4318 protocol: TCP - name: prometheus port: 8889 targetPort: 8889 protocol: TCP type: ClusterIP

Apply:

kubectl -n observability apply -f otel-collector-gateway-service.yaml

Then add this to your Prometheus scrape_configs:

- job_name: 'otel-collector-gateway' static_configs: - targets: ['otel-collector-gateway.observability.svc.cluster.local:8889']

Test and Verify

Check deployment status:

kubectl -n observability get all -l app=otel-collector-gateway

Special Mention: Debug Exporter - Your Observability Wingman

The debug exporter in OpenTelemetry Collector is a lightweight and incredibly helpful tool for developers and DevOps engineers when building or troubleshooting telemetry pipelines.

Instead of exporting telemetry data (like logs, metrics, and traces) to a backend system like Prometheus or Jaeger, the debug exporter simply prints the data to the Collector's stdout. This means:

You can see exactly what telemetry data is being received and processed—live in the logs.
It helps validate instrumentation quickly, without setting up full observability backends.
It's especially useful when you're testing new receivers, processors, or pipelines, and want a quick look at the output.

When to Use

Local testing or dev environments.
Debugging broken data flow—if Prometheus or Jaeger isn’t showing what you expect.
Learning how OpenTelemetry transforms and routes telemetry data.

Example Configuration Snippet

exporters: debug: verbosity: detailed # outputs full content of each signal

Then, reference it in your pipeline like this:

service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [jaeger, debug]

This ensures traces are sent to Jaeger and also printed to the console—great for double-checking what's going in.

Conclusion: You Now Have Full Observability!

Over the past 9 parts, you’ve:

Containerized a real ML application
Instrumented it with OpenTelemetry
Collected traces, logs, and metrics
Deployed observability tools in Kubernetes
Visualized everything in Grafana
Monitored the entire Kubernetes cluster with Agent + Gateway mode

You’ve essentially built a production-grade observability platform from scratch — without cloud vendor lock-in.

Missed the previous article?

Check out Part 8: Visualize Everything, Building a Unified Observability Dashboard with Grafana to see how we got here.

{ "author" : "Kartik Dudeja", "email" : "kartikdudeja21@gmail.com", "linkedin" : "https://linkedin.com/in/kartik-dudeja", "github" : "https://github.com/Kartikdudeja" }

DEV Community

OpenTelemetry in Action on Kubernetes: Part 9 - Cluster-Level Observability with OpenTelemetry Agent + Gateway

Why Cluster-Level Observability Matters

Add `hostmetrics` Receiver in the Agent

Deploy the OpenTelemetry Gateway

1. Why Deployment Mode?

2. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding

What is a ServiceAccount in Kubernetes?

3. OpenTelemetry Collector Config with `k8s_cluster` Receiver

4. Deploy the OpenTelemetry Collector

5. Expose Collector to Prometheus

Test and Verify

Special Mention: Debug Exporter - Your Observability Wingman

When to Use

Example Configuration Snippet

Conclusion: You Now Have Full Observability!

Missed the previous article?

Top comments (0)

Why Cluster-Level Observability Matters

Add hostmetrics Receiver in the Agent

Deploy the OpenTelemetry Gateway

1. Why Deployment Mode?

2. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding

What is a ServiceAccount in Kubernetes?

3. OpenTelemetry Collector Config with k8s_cluster Receiver

4. Deploy the OpenTelemetry Collector

5. Expose Collector to Prometheus

Test and Verify

Special Mention: Debug Exporter - Your Observability Wingman

When to Use

Example Configuration Snippet

Conclusion: You Now Have Full Observability!

Missed the previous article?

Add `hostmetrics` Receiver in the Agent

3. OpenTelemetry Collector Config with `k8s_cluster` Receiver