It looks like the Volcano Descheduler has suitable features for the above problem. From https://volcano.sh/en/docs/descheduler/:
The scheduling within a cluster is the process of assigning pending Pods to nodes for execution, with Pod scheduling relying on the cluster’s scheduler. The scheduler calculates the optimal node for a Pod’s execution through a series of algorithms. However, the Kubernetes cluster environment is dynamic, with changes such as a node requiring maintenance, which would result in all Pods on that node being evicted to other nodes, once maintenance is complete, the previously evicted Pods do not automatically return to the node, as once a Pod is bound to a node, it does not trigger a descheduling. Due to these changes, the cluster may become unbalanced over time.
To address the above issues, Volcano descheduler can evict Pods that do not comply with the configured strategies, allowing them to be rescheduled and thereby achieving balanced cluster load and reducing resource fragmentation. See repo:https://github.com/volcano-sh/descheduler.
And now that I found this, I also see it is based on the Descheduler for Kubernetes. From https://github.com/kubernetes-sigs/descheduler:
Scheduling in Kubernetes is the process of binding pending pods to nodes, and is performed by a component of Kubernetes called kube-scheduler. The scheduler's decisions, whether or where a pod can or can not be scheduled, are guided by its configurable policy which comprises of set of rules, called predicates and priorities. The scheduler's decisions are influenced by its view of a Kubernetes cluster at that point of time when a new pod appears for scheduling. As Kubernetes clusters are very dynamic and their state changes over time, there may be desire to move already running pods to some other nodes for various reasons:
- Some nodes are under or over utilized.
- The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
- Some nodes failed and their pods moved to other nodes.
- New nodes are added to clusters.
It is that last use case that I described above: I want pods to be rescheduled as the big node comes up and enhances the cluster capacity.