OpenShift CoreDNS

Metrics, Dashboards, Alerts and more for OpenShift CoreDNS Integration in Sysdig Monitor.
OpenShift CoreDNS

This integration is enabled by default.

Versions supported: > v4.8

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 13 metrics.

Timeseries generated: CoreDNS generates ~230 timeseries per dns-default pod

List of Alerts

AlertDescriptionFormat
[OpenShift CoreDNS] Process DownCoreDNS has disappeared from target discovery.Prometheus
[OpenShift CoreDNS] High Failed ResponsesCoreDNS is returning failed responses.Prometheus
[OpenShift CoreDNS] High LatencyCoreDNS responses latency is higher than 60ms.Prometheus
[OpenShift CoreDNS] Panics ObservedCoreDNS Panics Observed.Prometheus

List of Dashboards

OpenShift v4 CoreDNS

If you are using Prometheus Remote Write you will need to add the following metric relabel config for this label.

  - action: replace  source_labels: [ __address__ ]  target_label: _sysdig_integration_openshift_coredns   replacement: true 

The dashboard provides information on the OpenShift CoreDNS. OpenShift v4 CoreDNS

List of Metrics

Metric name
coredns_cache_hits_total
coredns_cache_misses_total
coredns_dns_request_duration_seconds_bucket
coredns_dns_request_size_bytes_bucket
coredns_dns_requests_total
coredns_dns_response_size_bytes_bucket
coredns_dns_responses_total
coredns_forward_request_duration_seconds_bucket
coredns_panics_total
coredns_plugin_enabled
go_goroutines
process_cpu_seconds_total
process_resident_memory_bytes

Prerequisites

None.

Installation

Installing an exporter is not required for this integration.

Monitoring and Troubleshooting OpenShift CoreDNS

Because OpenShift 4.X comes with both Prometheus and CoreDNS ready to use, no additional installation is required. OpenShift CoreDNS metrics are exposed on the SSL port 9154.

Here are some interesting queries to run and metrics to monitor for troubleshooting OpenShift 4.

CoreDNS Panics

Number of Panics

To check the CoreDNS number of panics, use the following query:

sum(coredns_panics_total) 

See the CoreDNS pods logs when you see this number growing.

DNS Requests

By Type

To filter DNS request types, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval])) by (type,kube_cluster_name,kube_pod_name)) 

By Protocol

To filter DNS request types by protocol, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval]) ) by (proto,kube_cluster_name,kube_pod_name)) 

By Zone

To filter DNS request types by zone, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval]) ) by (zone,kube_cluster_name,kube_pod_name)) 

By Latency

This metrics detects any degradation in the service. With the following query, you can compare percentile 99 against average.

histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by(server, zone, le)) 

Error Rate

Watch carefully for this metric as you can filter depending on the status code: 200,404,400,500.

sum by (server, status)(coredns_dns_https_responses_total{server, status}) 

Cache

Cache Hit

To check the cache hit rate, use the following query:

sum(rate(coredns_cache_hits_total[$__interval])) by (type,kube_cluster_name,kube_pod_name) 

Cache Miss

To check the cache miss rate, use the following query:

sum(rate(coredns_cache_misses_total[$__interval])) by(server,kube_cluster_name,kube_pod_name) 

Agent Configuration

The default agent job for this integration is as follows:

- job_name: openshift-dns-default  honor_labels: true  tls_config:  insecure_skip_verify: true  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token  scheme: https  kubernetes_sd_configs:  - role: pod  relabel_configs:  - action: keep  source_labels: [__meta_kubernetes_pod_host_ip]  regex: __HOSTIPS__  - source_labels: [__meta_kubernetes_pod_phase]  action: keep  regex: Running  - action: keep  source_labels:  - __meta_kubernetes_namespace  - __meta_kubernetes_pod_name  separator: '/'  regex: 'openshift-dns/dns-default.+'  - source_labels:  - __address__  action: keep  regex: (.*:9154)  - source_labels:  - __meta_kubernetes_pod_name  action: replace  target_label: instance  - action: labelmap  regex: __meta_kubernetes_pod_label_(.+)  - action: replace  source_labels: [__meta_kubernetes_pod_uid]  target_label: sysdig_k8s_pod_uid  - action: replace  source_labels: [__meta_kubernetes_pod_container_name]  target_label: sysdig_k8s_pod_container_name  - action: replace  source_labels: [ __address__ ]  target_label: _sysdig_integration_openshift_coredns   replacement: true  metric_relabel_configs:  - source_labels: [__name__]  regex: (coredns_cache_hits_total|coredns_cache_misses_total|coredns_dns_request_duration_seconds_bucket|coredns_dns_request_size_bytes_bucket|coredns_dns_requests_total|coredns_dns_response_size_bytes_bucket|coredns_dns_responses_total|coredns_forward_request_duration_seconds_bucket|coredns_panics_total|coredns_plugin_enabled|go_goroutines|process_cpu_seconds_total|process_resident_memory_bytes)  action: keep