File tree Expand file tree Collapse file tree 2 files changed +89
-0
lines changed
alerts/cluster-network-operator Expand file tree Collapse file tree 2 files changed +89
-0
lines changed Original file line number Diff line number Diff line change
1
+ # NodeWithoutOVNKubeNodePodRunning
2
+
3
+ ## Meaning
4
+
5
+ The ` NodeWithoutOVNKubeNodePodRunning ` alert is triggered when one or more Linux
6
+ nodes do not have a running OVNkube-node pod for a period of time.
7
+
8
+ ## Impact
9
+
10
+ This is a warning alert. Existing workloads on the node may continue to have
11
+ connectivity but any additional workloads will not be provisioned on the node.
12
+ Any network policy changes will not be implemented on existing workloads on the
13
+ node.
14
+
15
+ ## Diagnosis
16
+
17
+ Check the nodes which should have the ovnkube-node running.
18
+
19
+ oc get node -l kubernetes.io/os!=windows
20
+
21
+ Check the expected running replicas of ovnkube-node.
22
+
23
+ oc get daemonset ovnkube-node -n openshift-ovn-kubernetes
24
+
25
+ Check the ovnkube-node pods status on the nodes.
26
+
27
+ oc get po -n openshift-ovn-kubernetes -l app=ovnkube-node -o wide
28
+
29
+ Describe the pod if there is non-running ovnkube-node pod.
30
+
31
+ oc describe po -n openshift-ovn-kubernetes <ovnkube-node-name>
32
+
33
+ Check the pod logs for the failing ovnkube-node pods
34
+
35
+ oc logs <ovnkube-node-name> -n openshift-ovn-kubernetes --all-containers
36
+
37
+ ## Mitigation
38
+
39
+ Mitigation for this alert is not possible to understand in advance.
40
+
41
+ If you are seeing that any of the ovnkube-node pods is not in Running status,
42
+ you can try to delete the pod and let it being recreated by the daemonset
43
+ controller.
44
+
45
+ oc delete po <ovnkube-node> -n openshift-ovn-kubernetes
46
+
47
+ If the issue cannot be fixed by recreating the pod, reboot of the affected node
48
+ might be an option to refresh the full stack (include OVS on the node).
Original file line number Diff line number Diff line change
1
+ # V4SubnetAllocationThresholdExceeded
2
+
3
+ ## Meaning
4
+
5
+ The ` V4SubnetAllocationThresholdExceeded ` alert is triggered when more than
6
+ 80% of subnets for nodes are allocated.
7
+
8
+ ## Impact
9
+
10
+ This is a warning alert. No immediate impact to the cluster will be observed if
11
+ this alert fires and it is a warning to be mindful of your remaining node
12
+ subnet allocation. If your remaining subnets are exhausted, then no
13
+ further nodes can be added to your cluster.
14
+
15
+ ## Diagnosis
16
+
17
+ Check the network configuration on the cluster.
18
+
19
+ oc get networks.config.openshift.io/cluster -o jsonpath='{.spec.clusterNetwork}'
20
+
21
+ [{"cidr":"10.128.0.0/14","hostPrefix":23}]
22
+
23
+ Calculate the IPv4 subnets capability.
24
+
25
+ subnet_capability = 2^[(32 - clusternetwork_netmask) - (32 - hostPrefix)]
26
+
27
+ It will be 512 if the CIDR netmask is ` /14 ` and hostPrefix is ` 23 ` , that means
28
+ the cluster can have at most 512 nodes.
29
+
30
+ Count the number of nodes to compare.
31
+
32
+ oc get node --no-headers | wc -l
33
+
34
+ ## Mitigation
35
+
36
+ We do not support adding additional cluster networks for ovn-kuberntes.
37
+
38
+ User will have to create a new cluster for more worker nodes.
39
+
40
+ Choosing a larger cluster network CIDR which can hold more subnets could prevent
41
+ this happening.
You can’t perform that action at this time.
0 commit comments