Skip to content

Commit da32623

Browse files
Merge pull request openshift#42 from bmeng/ovn-kube
add runbook for NodeWithoutOVNKubeNodePodRunning and V4SubnetAllocati…
2 parents b79afa9 + 3e80c17 commit da32623

File tree

2 files changed

+89
-0
lines changed

2 files changed

+89
-0
lines changed
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# NodeWithoutOVNKubeNodePodRunning
2+
3+
## Meaning
4+
5+
The `NodeWithoutOVNKubeNodePodRunning` alert is triggered when one or more Linux
6+
nodes do not have a running OVNkube-node pod for a period of time.
7+
8+
## Impact
9+
10+
This is a warning alert. Existing workloads on the node may continue to have
11+
connectivity but any additional workloads will not be provisioned on the node.
12+
Any network policy changes will not be implemented on existing workloads on the
13+
node.
14+
15+
## Diagnosis
16+
17+
Check the nodes which should have the ovnkube-node running.
18+
19+
oc get node -l kubernetes.io/os!=windows
20+
21+
Check the expected running replicas of ovnkube-node.
22+
23+
oc get daemonset ovnkube-node -n openshift-ovn-kubernetes
24+
25+
Check the ovnkube-node pods status on the nodes.
26+
27+
oc get po -n openshift-ovn-kubernetes -l app=ovnkube-node -o wide
28+
29+
Describe the pod if there is non-running ovnkube-node pod.
30+
31+
oc describe po -n openshift-ovn-kubernetes <ovnkube-node-name>
32+
33+
Check the pod logs for the failing ovnkube-node pods
34+
35+
oc logs <ovnkube-node-name> -n openshift-ovn-kubernetes --all-containers
36+
37+
## Mitigation
38+
39+
Mitigation for this alert is not possible to understand in advance.
40+
41+
If you are seeing that any of the ovnkube-node pods is not in Running status,
42+
you can try to delete the pod and let it being recreated by the daemonset
43+
controller.
44+
45+
oc delete po <ovnkube-node> -n openshift-ovn-kubernetes
46+
47+
If the issue cannot be fixed by recreating the pod, reboot of the affected node
48+
might be an option to refresh the full stack (include OVS on the node).
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# V4SubnetAllocationThresholdExceeded
2+
3+
## Meaning
4+
5+
The `V4SubnetAllocationThresholdExceeded` alert is triggered when more than
6+
80% of subnets for nodes are allocated.
7+
8+
## Impact
9+
10+
This is a warning alert. No immediate impact to the cluster will be observed if
11+
this alert fires and it is a warning to be mindful of your remaining node
12+
subnet allocation. If your remaining subnets are exhausted, then no
13+
further nodes can be added to your cluster.
14+
15+
## Diagnosis
16+
17+
Check the network configuration on the cluster.
18+
19+
oc get networks.config.openshift.io/cluster -o jsonpath='{.spec.clusterNetwork}'
20+
21+
[{"cidr":"10.128.0.0/14","hostPrefix":23}]
22+
23+
Calculate the IPv4 subnets capability.
24+
25+
subnet_capability = 2^[(32 - clusternetwork_netmask) - (32 - hostPrefix)]
26+
27+
It will be 512 if the CIDR netmask is `/14` and hostPrefix is `23`, that means
28+
the cluster can have at most 512 nodes.
29+
30+
Count the number of nodes to compare.
31+
32+
oc get node --no-headers | wc -l
33+
34+
## Mitigation
35+
36+
We do not support adding additional cluster networks for ovn-kuberntes.
37+
38+
User will have to create a new cluster for more worker nodes.
39+
40+
Choosing a larger cluster network CIDR which can hold more subnets could prevent
41+
this happening.

0 commit comments

Comments
 (0)