0

I have a Kubernetes cluster that runs services getting used rarely. For efficiency it runs all pods on a small node - the small node is always up. Every now and then a big node becomes available and joins the cluster - but this big node does not stay forever.

But while the big node is available, I'd like some of the pods to get rescheduled to the big node so processing is faster. With that the cluster would take advantage of the big node whenever it is available. How can I achieve that?

I understand labels and pod affinity are involved but all example that I find have a requiredDuringSchedulingIgnoredDuringExecution clause that would likely not apply when the big node becomes available.

2 Answers 2

0

Experiment with your workload size.

Add the big node. Manually terminate and recreate more and bigger pods to fit the extra resources. Create parallel jobs for batch workloads.

Avoid affinity rules and advanced node selection, when the only constraint is simple quantity of CPU and memory resources.

Resize pods smaller, and drain the node, to experience the opposite. Define a PodDisruptionBudget to formalize what fraction of containers can be voluntarily down.

If the scaling proof of concept was effective, automate it. Kubernetes docs list of workload autoscalers includes a couple proportional to size, both horizontal and vertical. Would need to be installed and deployed alongside pods.

Most autoscalers are responsive to shifts in demand. Users request more from the application, worker pods are more utilized and more of them are started, and if a node is full any node auto scaler brings up more hosts. Pods already running do not need to be moved, new node is for new pods.

You are attempting the opposite, a node happens to be available and suddenly resource supply is not constrained. Probably do not have a huge number of pods waiting unscheduled. So some creativity may be required in scripting the start of more workload to meet supply.

0
0

It looks like the Volcano Descheduler has suitable features for the above problem. From https://volcano.sh/en/docs/descheduler/:

The scheduling within a cluster is the process of assigning pending Pods to nodes for execution, with Pod scheduling relying on the cluster’s scheduler. The scheduler calculates the optimal node for a Pod’s execution through a series of algorithms. However, the Kubernetes cluster environment is dynamic, with changes such as a node requiring maintenance, which would result in all Pods on that node being evicted to other nodes, once maintenance is complete, the previously evicted Pods do not automatically return to the node, as once a Pod is bound to a node, it does not trigger a descheduling. Due to these changes, the cluster may become unbalanced over time.

To address the above issues, Volcano descheduler can evict Pods that do not comply with the configured strategies, allowing them to be rescheduled and thereby achieving balanced cluster load and reducing resource fragmentation. See repo:https://github.com/volcano-sh/descheduler.

And now that I found this, I also see it is based on the Descheduler for Kubernetes. From https://github.com/kubernetes-sigs/descheduler:

Scheduling in Kubernetes is the process of binding pending pods to nodes, and is performed by a component of Kubernetes called kube-scheduler. The scheduler's decisions, whether or where a pod can or can not be scheduled, are guided by its configurable policy which comprises of set of rules, called predicates and priorities. The scheduler's decisions are influenced by its view of a Kubernetes cluster at that point of time when a new pod appears for scheduling. As Kubernetes clusters are very dynamic and their state changes over time, there may be desire to move already running pods to some other nodes for various reasons:

  • Some nodes are under or over utilized.
  • The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
  • Some nodes failed and their pods moved to other nodes.
  • New nodes are added to clusters.

It is that last use case that I described above: I want pods to be rescheduled as the big node comes up and enhances the cluster capacity.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.