Google Distributed Cloud nettest identifies connectivity issues in the Kubernetes objects in your clusters, such as Pods, Nodes, Services, and some external targets. nettest doesn't check connections from external targets to Pods, Nodes, or Services. This document describes how to deploy and run nettest with one of the manifests, nettest.yaml or nettest_rhel.yaml, in the anthos-samples GitHub repository. Use nettest_rhel.yaml if you run Google Distributed Cloud on Red HatEnterprise Linux (RHEL). Use nettest.yaml if you run Google Distributed Cloud on Ubuntu.
This document also describes how you interpret the logs generated by nettest to identify connectivity problems with your clusters.
About nettest
The nettest diagnostic tool consists of the following Kubernetes objects. Each object is specified in the nettest YAML manifest files.
cloudprober: a DaemonSet and a Service responsible for collecting network connection status, such as error rate and latency.echoserver: a DaemonSet and a Service responsible for responding tocloudprober, providing it the metrics for network connectivity.nettest: a Pod containing theprometheusandnettestcontainers.prometheuscollects metrics fromcloudprober.nettestqueriesprometheusand displays the network test results in the log.
nettest-engine: a ConfigMap to configure thenettestcontainer in thenettestPod.
The manifest also specifies the nettest namespace and a dedicated ServiceAccount (along with ClusterRole and ClusterRoleBinding) to isolate nettest from other cluster resources.
Run nettest
Deploy nettest by running the following command for your operating system. When the nettest Pod starts, the test runs automatically. The test takes about five minutes to complete.
For Ubuntu:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml For RHEL:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest_rhel.yaml Get the test results
After the test has completed, which should take around five minutes after the nettest manifest is deployed, run the following command to see the nettest results:
kubectl -n nettest logs nettest -c nettest While nettest is running, it sends messages like the following to stdout:
I0413 03:33:04.879141 1 collectorui.go:130] Listening on ":8999" I0413 03:33:04.879258 1 prometheus.go:172] Running prometheus controller E0413 03:33:04.879628 1 prometheus.go:178] Prometheus controller: failed to retries probers: Get "http://127.0.0.1:9090/api/v1/targets": dial tcp 127.0.0.1:9090: connect: connection refused If nettest runs successfully without identifying any connectivity failures, you see the following log entry:
I0211 21:58:34.689290 1 validate_metrics.go:78] Metric validation passed! If nettest found connection issues, it writes log entries like the following:
E0211 06:40:11.948634 1 collector.go:65] Engine error: step validateMetrics failed: "Error rate in percentage": probe from "10.200.0.3" to "172.26.115.210:80" has value 100.000000, threshold is 1.000000 "Error rate in percentage": probe from "10.200.0.3" to "172.26.27.229:80" has value 100.000000, threshold is 1.000000 "Error rate in percentage": probe from "192.168.3.248" to "echoserver-hostnetwork_10.200.0.2_8080" has value 2.007046, threshold is 1.000000 Although the default threshold is one percent (1.000000), error rates up to five percent can be ignored safely. For example, the error rate for connectivity from IP address 192.168.3.248 to echoserver-hostnetwork_10.200.0.2_8080 in the preceding example is approximately two percent (2.007046). This is an example of a reported connectivity issue that you can ignore.
Interpret the test results
When nettest finishes and finds a connectivity issue, you see the following entry in the nettest Pod logs:
"Error rate in percentage": probe from {src} to {dst} has value 100.000000, threshold is 1.000000 Here, {src} and {dst} can be either:
echoserverPod IP: the connection to or from a Pod on the node.- Node IP: the connection to or from the node.
- Service IP (see the following text for details)
In addition, {dst} can also be:
google.com: an external connection.dns: the connection to a non-hostNetworkService through DNS, that isechoserver-non-hostnetwork.nettest.svc.cluster.local.The details for Service IP are in JSON-formatted probe entries in the log, like the following example. The following probe example shows that
172.26.27.229:80is the address forservice-clusterip. There are two probes with thistargetsvalue, one for the Pod (pod-service-clusterip) and one for the Node (node-service-clusterip).probe { name: "node-service-clusterip" … targets { host_names: "172.26.27.229:80" }
Validate your fixes
When have addressed all reported connectivity issues, remove the nettest Pod and reapply the nettest manifest to rerun the tests for connectivity.
For example, to rerun nettest for Ubuntu, run the following commands:
kubectl -n nettest delete pod nettest kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml Clean up nettest
When you're done testing, run the following commands to remove all nettest resources:
kubectl delete namespace nettest kubectl delete clusterroles nettest:nettest kubectl delete clusterrolebindings nettest:nettest What's next
If you need additional assistance, reach out to Cloud Customer Care. You can also see Getting support for more information about support resources, including the following:
- Requirements for opening a support case.
- Tools to help you troubleshoot, such as your environment configuration, logs, and metrics.
- Supported components.