Kubernetes CoreDNS: What it is and how to monitor

CoreDNS is the DNS add-on component for Kubernetes. It runs on control plane nodes and is critical for cluster operation and responsiveness. Understanding how to monitor CoreDNS and which metrics matter is essential for operations teams.

Why DNS matters

DNS is one of the most sensitive and important services in any architecture. Applications, microservices, services, and hosts all rely on DNS. DNS resolves domain names and maps internal or external services to Pod IPs. Maintaining DNS records for Pods is especially important because Pod IPs can change without warning.

If you run workloads on Kubernetes and need to monitor CoreDNS, this article explains how to scrape CoreDNS metrics with Prometheus, which metrics to check, and what they mean.

What is Kubernetes CoreDNS?

Starting with Kubernetes 1.11, CoreDNS was introduced as a replacement for kube-dns after DNS-based service discovery reached general availability. CoreDNS is the de facto DNS engine for Kubernetes clusters. Written in Go, CoreDNS is widely adopted for its flexibility.

kube-dns historically implemented DNS functionality in a single Pod that contained three different containers: kubedns, dnsmasq, and a sidecar. Briefly:

kubedns: SkyDNS implementation responsible for DNS resolution inside the cluster. It watches Kubernetes A records and provides appropriate DNS records.
dnsmasq: Provides DNS caching for SkyDNS queries.
sidecar: Exports metrics and performs health checks for the DNS service.

CoreDNS addressed several issues present in kube-dns. dnsmasq introduced some security concerns that required patches, and kube-dns required three containers to provide all functions. CoreDNS provides the DNS functionality in a single container and resolves several issues related to stub domains for external services.

CoreDNS exposes its metrics endpoint on port 9153 and it can be accessed from Pods or host networks in the cluster.

# kubectl get ep kube-dns -n kube-system -o json | jq -r ".subsets" [ { "addresses": [ { "ip": "192.169.107.100", "nodeName": "k8s-control-2.lab.example.com", "targetRef": { "kind": "Pod", "name": "coredns-565d847f94-rz4b6", "namespace": "kube-system", "uid": "c1b62754-4740-49ca-b506-3f40fb681778" } }, { "ip": "192.169.203.46", "nodeName": "k8s-control-3.lab.example.com", "targetRef": { "kind": "Pod", "name": "coredns-565d847f94-8xqxg", "namespace": "kube-system", "uid": "bec3ca63-f09a-4007-82e9-0e147e8587de" } } ], "ports": [ { "name": "dns-tcp", "port": 53, "protocol": "TCP" }, { "name": "dns", "port": 53, "protocol": "UDP" }, { "name": "metrics", "port": 9153, "protocol": "TCP" } ] } ]

How to access CoreDNS metrics

Each CoreDNS Pod exposes a metrics endpoint on port 9153. You can curl that endpoint to retrieve metrics at /metrics.

Manual access

Once you know the endpoint or IP of a CoreDNS instance, try accessing port 9153.

# curl http://192.169.203.46:9153/metrics # HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built. # TYPE coredns_build_info gauge coredns_build_info{goversion="go1.18.2",revision="45b0a11",version="1.9.3"} 1 # HELP coredns_cache_entries The number of elements in the cache. # TYPE coredns_cache_entries gauge coredns_cache_entries{server="dns://:53",type="denial",zones="."} 46 coredns_cache_entries{server="dns://:53",type="success",zones="."} 9 # HELP coredns_cache_hits_total The count of cache hits. # TYPE coredns_cache_hits_total counter coredns_cache_hits_total{server="dns://:53",type="denial",zones="."} 6471 coredns_cache_hits_total{server="dns://:53",type="success",zones="."} 6596 # HELP coredns_cache_misses_total The count of cache misses. Deprecated, derive misses from cache hits/requests counters. # TYPE coredns_cache_misses_total counter coredns_cache_misses_total{server="dns://:53",zones="."} 1951 # HELP coredns_cache_requests_total The count of cache requests. # TYPE coredns_cache_requests_total counter coredns_cache_requests_total{server="dns://:53",zones="."} 15018 # HELP coredns_dns_request_duration_seconds Histogram of the time (in seconds) each request took per zone. # TYPE coredns_dns_request_duration_seconds histogram coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.00025"} 14098 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.0005"} 14836 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.001"} 14850 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.002"} 14856 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.004"} 14857 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.008"} 14870 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.016"} 14879 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.032"} 14883 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.064"} 14884 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.128"} 14884 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.256"} 14885 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="0.512"} 14886 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="1.024"} 14887 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="2.048"} 14903 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="4.096"} 14911 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="8.192"} 15018 coredns_dns_request_duration_seconds_bucket{server="dns://:53",zone=".",le="+Inf"} 15018 coredns_dns_request_duration_seconds_sum{server="dns://:53",zone="."} 698.531992215999 coredns_dns_request_duration_seconds_count{server="dns://:53",zone="."} 15018 ...(output truncated)

You can also access the metrics through the CoreDNS Kubernetes Service that is exposed by default in the cluster.

# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 129d # kubectl exec -it my-pod -n default -- /bin/bash # curl http://kube-dns.kube-system.svc:9153/metrics

Configuring Prometheus to scrape CoreDNS

Prometheus provides a set of roles to discover targets and scrape metrics from sources such as Pods, nodes, and services. To scrape CoreDNS running inside the cluster, configure prometheus.yml to use the endpoints role to discover CoreDNS targets.

Edit the prometheus ConfigMap that contains prometheus.yml, for example:

# kubectl edit cm prometheus-server -n monitoring -o yaml

Then add the following snippet under scrape_configs:

- honor_labels: true job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: drop regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: (.+?)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+) replacement: __param_$1 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: service - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: node

After redeploying the Prometheus Pod, you should see CoreDNS metric endpoints in the Prometheus UI under Status > Targets.

CoreDNS architecture diagram

Once discovered, CoreDNS metrics will be available in Prometheus.

CoreDNS metrics example in Prometheus

Which metrics should you check?

CoreDNS metrics vary by Kubernetes and CoreDNS version. The examples here use Kubernetes 1.25 and CoreDNS 1.9.3. Check the CoreDNS repository for metrics relevant to your version.

Start with availability: the number of CoreDNS replicas running in the cluster can change, so monitor replica count to detect changes that could affect availability or performance. The coredns_build_info metric provides information about the CoreDNS build running on those Pods.

count(coredns_build_info)

Use the four golden signals approach: errors, latency, traffic, and saturation. The following sections show how to monitor CoreDNS from that perspective.

Errors

Measuring the number of errors in CoreDNS helps assess cluster, application, and service health. Unexpected DNS error responses can cause serious issues. Watch for SERVFAIL and REFUSED responses when resolving names for internal components and applications.

Metric: coredns_dns_responses_total. This counter includes response codes, namespace, and instance. You may want to compute the rate per response code to monitor error rates.

sum(rate(coredns_dns_responses_total{instance=~".*"}[2m])) by (rcode, instance)

CoreDNS response codes over time

Latency

Latency measurement is critical for DNS performance. High or increasing latency may indicate load issues. Overloaded CoreDNS instances can lead to resolution problems and downstream latency or outages.

Metric: coredns_dns_request_duration_seconds_bucket. Calculate the 99th percentile to see latency distribution across instances.

histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket{instance=~".*"}[2m])) by (server, zone, le, instance))

Traffic

Monitor the request rate handled by CoreDNS. Tracking traffic for spikes or trends is important to maintain performance.

Metric: coredns_dns_requests_total. This is a counter by zone, protocol, and series. Monitor request rate by query type, for example A for IPv4 and AAAA for IPv6 queries.

(sum(rate(coredns_dns_requests_total{instance=~".*"}[2m])) by (type, instance))

DNS request rate by type

Saturation

Monitor saturation using system resource metrics such as CPU, memory, and network usage for CoreDNS Pods to detect resource exhaustion.

Cache metrics

CoreDNS implements a caching mechanism that can cache records for up to 3600 seconds. The cache reduces load and improves performance.

Metric: coredns_cache_hits_total. Monitor cache hit rates, including success and denial rates, with a query such as:

sum(rate(coredns_cache_hits_total{instance=~".*"}[2m])) by (type, instance)

CoreDNS cache hits by type

Conclusion

CoreDNS is a common choice for implementing DNS in Kubernetes because of its flexibility and the issues it addresses compared with kube-dns. DNS is essential to cluster operation, so continuous monitoring of DNS services is necessary to ensure a healthy Kubernetes infrastructure.

This article explained how to scrape CoreDNS metrics and configure a Prometheus instance to scrape CoreDNS endpoints. With the key CoreDNS metrics presented here, you can start monitoring CoreDNS in any Kubernetes environment.