DEV Community

Cover image for Writing a kubernetes controller in Go with kubebuilder
Ishan Khare
Ishan Khare

Posted on

Writing a kubernetes controller in Go with kubebuilder

In this post we'll implement a simple kubernetes controller using the kubebuilder

Installing kubebuilder

The installation instructions can be found on kubebuilder installation docs

Install kustomize

Since kubebuilder internally relies on kustomize, hence we need to install that from the instructions here

Now we're all set up to start building our controller


Start the project

I have initialized an empty go project

go mod init cnat 
Enter fullscreen mode Exit fullscreen mode

Next we'll initialize the controller project

kubebuilder init --domain ishankhare.dev 
Enter fullscreen mode Exit fullscreen mode

Now we'll ask kubebuilder to setup all the necessary scaffolding for our project

kubebuilder create api --group cnat --version v1alpha1 --kind At 
Enter fullscreen mode Exit fullscreen mode

Kubebuilder will ask us whether to create Resource and Controller directories, to which we can answer y each, since we want these to be created

Create Resource [y/n] y Create Controller [y/n] y Writing scaffold for you to edit... 
Enter fullscreen mode Exit fullscreen mode

Test install scaffold

If we now run make install, kubebuilder should generate the base CRDs under config/crd/bases and a few other files for us.

Running make run should now allow us to launch the operator locally. This is really helpful for local testing while we will be implementing and debugging the business logic of our operator code. The live logs will give us insights about how our codebase changes reflect to what's happening with the operator.

Enable status subresource

In the file api/v1alpha1/at_types.go add the following comment above At struct.

// +kubebuilder:object:root=true // +kubebuilder:subresource:status // At is the Schema for the ats API type At struct { . . . 
Enter fullscreen mode Exit fullscreen mode

This will enable us to use privilege separation between spec and status fields.

Background

A little background on what we're going to build here. Our controller is basically a 'Cloud Native At' (cnat) for short – a cloud native version of the Unix's at command. We provide a CRD manifest containing 2 main things:

  1. A command to execute
  2. Time to execute the command

Our controller will basically wait and watch till the appropriate time, then would spawn a Pod and run the given command in that Pod. All this will it will also keep updating the Status of that CRD which we can view with kubectl


So let's begin

CRD outline

This section basically describes how we want our CRD to look like. Given the two points we listed above, we can basically represent them as:

apiVersion: cnat.ishankhare.dev/v1alpha1 kind: At metadata: name: at-sample spec: schedule: "2020-11-16T10:12:00Z" command: "echo hello world!" 
Enter fullscreen mode Exit fullscreen mode

This is enough information for our controller to decide 'when' and 'what' to execute.

Defining the CRD

Now that we know what we expect out of the CRD let's implement that. Goto the file api/v1alpha1/at_types.go and add the following fields to the code:

const ( PhasePending = "PENDING" PhaseRunning = "RUNNING" PhaseDone = "DONE" ) 
Enter fullscreen mode Exit fullscreen mode

Also we edit the existing AtSpec and AtStatus structs as follows:

type AtSpec struct { Schedule string `json:"schedule,omitempty"` Command string `json:"command,omitempty"` } type AtStatus struct { Phase string `json:"phase,omitempty"` } 
Enter fullscreen mode Exit fullscreen mode

We save this as we run

$ make manifests go: creating new go.mod: module tmp go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5 /Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases $ make install # this will install the generated CRD into k8s 
Enter fullscreen mode Exit fullscreen mode

We should now be able to see an exhaustive CustomResourceDefinition generated for us at config/crd/bases/cnat.ishankhare.dev_ats.yaml. For people new to CRD's, they will basically help kubernetes identify what our kind: At means. Unlike builtin resource types like Deployments, Pods etc. we need to define custom types for kubernetes through a CustomResourceDefinition. Let's try to create an At resource:

cat >sample-at.yaml<<EOF apiVersion: cnat.ishankhare.dev/v1alpha1 kind: At metadata: name: at-sample spec: schedule: "2020-11-16T10:12:00Z" command: "echo hello world!" EOF 
Enter fullscreen mode Exit fullscreen mode

If we now do a kubectl apply -f to the previously shown CRD:

kubectl apply -f sample-at.yaml kubectl get at NAME AGE at-sample 124m 
Enter fullscreen mode Exit fullscreen mode

So now kubernetes recognizes our custom type, but it still does NOT know what to do with it. The logic part of the controller is still missing, but we have the skeleton ready to start writing our logic around it.


Implementing the Controller logic

Now we're coming to the fun part – implementing the logic for our operator. This will basically tell kubernetes what to do with our generated CRD. As few say its like "making kubernetes do tricks!".

Most of the scaffolding as been already generated for us by kubebuilder in main.go and controllers/. Inside the controllers/at_controller.go file we have the function Reconcile. Consider this is the reconcile loop's body.

A small refresher – a controller basically runs a periodic loop over the said objects and watches for any changes to those objects. When changes happen the loop body is triggered and the relevant logic can be executed. This function Reconcile is that logical piece, the entry-point for all our controller logic.

Let's start adding a basic structuring around it:

func (r *AtReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { reqLogger := r.Log.WithValues("at", req.NamespacedName) reqLogger.Info("=== Reconciling At") instance := &cnatv1alpha1.At{} err := r.Get(context.TODO(), req.NamespacedName, instance) if err != nil { if errors.IsNotFound(err) { // object not found, could have been deleted after // reconcile request, hence don't requeue return ctrl.Result{}, nil } // error reading the object, requeue the request return ctrl.Result{}, err } // if no phase set, default to Pending if instance.Status.Phase == "" { instance.Status.Phase = cnatv1alpha1.PhasePending } // state transition PENDING -> RUNNING -> DONE return ctrl.Result{}, nil } 
Enter fullscreen mode Exit fullscreen mode

Here we basically create a logger and an empty instance of the At object, which will allow use to query kubernetes for existing At objects and read/write to their status etc. The function is supposed to return the result of the reconciliation logic and an error. The ctrl.Result object can optionally tell the reconciler to either Requeue immediately (bool) or RequeueAfter with a time.Duration – again these are optional.

Now lets come to the important part, the state diagram. The image below explains what basically happens when we create a new At:

State diagram

Now, lets start implementing this logic in the function:

... // state transition PENDING -> RUNNING -> DONE switch instance.Status.Phase { case cnatv1alpha1.PhasePending: reqLogger.Info("Phase: PENDING") diff, err := schedule.TimeUntilSchedule(instance.Spec.Schedule) if err != nil { reqLogger.Error(err, "Schedule parsing failure") return ctrl.Result{}, err } reqLogger.Info("Schedule parsing done", "Result", fmt.Sprintf("%v", diff)) if diff > 0 { // not yet time to execute, wait until scheduled time return ctrl.Result{RequeueAfter: diff * time.Second}, nil } reqLogger.Info("It's time!", "Ready to execute", instance.Spec.Command) // change state instance.Status.Phase = cnatv1alpha1.PhaseRunning 
Enter fullscreen mode Exit fullscreen mode

The schedule.TimeUntilSchedule function is simple to implement like so:

package schedule import "time" func TimeUntilSchedule(schedule string) (time.Duration, error) { now := time.Now().UTC() layout := "2006-01-02T15:04:05Z" scheduledTime, err := time.Parse(layout, schedule) if err != nil { return time.Duration(0), err } return scheduledTime.Sub(now), nil } 
Enter fullscreen mode Exit fullscreen mode

The next case in our switch body is for RUNNING phase. This is the most complicated one and I've added comments wherever I can to explain it better.

case cnatv1alpha1.PhaseRunning: reqLogger.Info("Phase: RUNNING") pod := spawn.NewPodForCR(instance) err := ctrl.SetControllerReference(instance, pod, r.Scheme) if err != nil { // requeue with error return ctrl.Result{}, err } query := &corev1.Pod{} // try to see if the pod already exists err = r.Get(context.TODO(), req.NamespacedName, query) if err != nil && errors.IsNotFound(err) { // does not exist, create a pod err = r.Create(context.TODO(), pod) if err != nil { return ctrl.Result{}, err } // Successfully created a Pod reqLogger.Info("Pod Created successfully", "name", pod.Name) return ctrl.Result{}, nil } else if err != nil { // requeue with err reqLogger.Error(err, "cannot create pod") return ctrl.Result{}, err } else if query.Status.Phase == corev1.PodFailed || query.Status.Phase == corev1.PodSucceeded { // pod already finished or errored out` reqLogger.Info("Container terminated", "reason", query.Status.Reason, "message", query.Status.Message) instance.Status.Phase = cnatv1alpha1.PhaseDone } else { // don't requeue, it will happen automatically when // pod status changes return ctrl.Result{}, nil } 
Enter fullscreen mode Exit fullscreen mode

We set the ctrl.SetControllerReference that basically tells kubernetes runtime that the created Pod is "owned" by this At instance. This is later going to come handy for us when watching on resources created by our Controller.
We also use another external function here spawn.NewPodForCR which is basically responsible for spawning the new Pod for our At custom resource spec. This is implemented as follows:

package spawn import ( cnatv1alpha1 "cnat/api/v1alpha1" "strings" corev1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" ) func NewPodForCR(cr *cnatv1alpha1.At) *corev1.Pod { labels := map[string]string{ "app": cr.Name, } return &corev1.Pod{ ObjectMeta: metav1.ObjectMeta{ Name: cr.Name, Namespace: cr.Namespace, Labels: labels, }, Spec: corev1.PodSpec{ Containers: []corev1.Container{ { Name: "busybox", Image: "busybox", Command: strings.Split(cr.Spec.Command, " "), }, }, RestartPolicy: corev1.RestartPolicyOnFailure, }, } } 
Enter fullscreen mode Exit fullscreen mode

The last bits of code deal with the DONE phase and the default case of switch. We also do the updation of our status subresource outside the switch as a common part of code:

case cnatv1alpha1.PhaseDone: reqLogger.Info("Phase: DONE") // reconcile without requeuing return ctrl.Result{}, nil default: reqLogger.Info("NOP") return ctrl.Result{}, nil } // update status err = r.Status().Update(context.TODO(), instance) if err != nil { return ctrl.Result{}, err } return ctrl.Result{}, nil 
Enter fullscreen mode Exit fullscreen mode

Lastly we utilize the SetControllerReference by adding the following to the SetupWithManager function in the same file:

err := ctrl.NewControllerManagedBy(mgr). For(&cnatv1alpha1.At{}). Owns(&corev1.Pod{}). Complete(r) if err != nil { return err } return nil 
Enter fullscreen mode Exit fullscreen mode

The part specifying Owns(&corev1.Pod{}) is important and tells the controller manager that pods created by this controller also needs to be watched for changes.

If you want to have a look at the entire final code implementation, head over the this github repo – ishankhare07/kubebuilder-controller.
With this in place, we are now done with implementation of our controller, we can run and test it on a cluster in our current kube context using make run

go: creating new go.mod: module tmp go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5 /Users/ishankhare/godev/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." go fmt ./... go vet ./... /Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases go run ./main.go 2020-11-18T05:05:45.531+0530 INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"} 2020-11-18T05:05:45.531+0530 INFO setup starting manager 2020-11-18T05:05:45.531+0530 INFO controller-runtime.manager starting metrics server {"path": "/metrics"} 2020-11-18T05:05:45.531+0530 INFO controller-runtime.controller Starting EventSource {"controller": "at", "source": "kind source: /, Kind="} 2020-11-18T05:05:45.732+0530 INFO controller-runtime.controller Starting EventSource {"controller": "at", "source": "kind source: /, Kind="} 2020-11-18T05:05:46.232+0530 INFO controller-runtime.controller Starting Controller {"controller": "at"} 2020-11-18T05:05:46.232+0530 INFO controller-runtime.controller Starting workers {"controller": "at", "worker count": 1} 2020-11-18T05:05:46.232+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.232+0530 INFO controllers.At Phase: PENDING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.232+0530 INFO controllers.At Schedule parsing done {"at": "cnat/at-sample", "Result": "-14053h23m46.232658s"} 2020-11-18T05:05:46.232+0530 INFO controllers.At It's time! {"at": "cnat/at-sample", "Ready to execute": "echo YAY!"} 2020-11-18T05:05:46.436+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.443+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.443+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.718+0530 INFO controllers.At Pod Created successfully {"at": "cnat/at-sample", "name": "at-sample"} 2020-11-18T05:05:46.718+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.846+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.847+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.847+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At Container terminated {"at": "cnat/at-sample", "reason": "", "message": ""} 2020-11-18T05:05:50.054+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 INFO controllers.At Phase: DONE {"at": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 
Enter fullscreen mode Exit fullscreen mode

If you're interested to see the output command we can run:

$ kubectl logs -f at-sample hello world! 
Enter fullscreen mode Exit fullscreen mode

This post was first published on my personal blog ishankhare.dev

Top comments (0)