DEV Community

alessskeno
alessskeno

Posted on

Setting Up a Production-Ready Kubernetes Cluster with RKE2 in vSphere Using Terraform

Setting up a robust Kubernetes cluster in a production environment is no small feat. In this article, I’ll walk you through my journey of deploying an RKE2 (Rancher Kubernetes Engine 2) cluster in a vSphere environment using a custom Terraform module. This guide will include configurations, explanations, and a peek into the setup process with screenshots.

Why RKE2 on vSphere?

RKE2 provides a lightweight yet powerful Kubernetes distribution ideal for secure production workloads. When combined with vSphere’s virtualization and Terraform’s Infrastructure-as-Code capabilities, you can achieve a flexible, scalable, and automated deployment process.

Tools and Technologies

Terraform: Automates the provisioning of infrastructure.
RKE2: Kubernetes distribution optimized for production environments.
vSphere: Virtualization platform for deploying VMs.
Ansible: Used for post-deployment configuration.
Prerequisites:
Python 3.x
Ansible, Ansible-Core
sshpass and whois (for password management)

Terraform Module Overview

I built a reusable Terraform module to standardize and automate the Kubernetes cluster provisioning. Here's an overview of the main.tf configuration file that calls the module:

Core Module Features

Multi-AZ Clusters: Enables highly available clusters with master and worker nodes spread across multiple availability zones.
Customizable Resources: Easily configure CPU, memory, and storage for master, worker, and storage nodes.
Built-in RKE2 Installation: Installs RKE2 with a choice of CNI plugins (canal, flannel, etc.).
Networking Configuration: Define Kubernetes service and cluster CIDRs.
Storage Options: Support for local storage and optional NFS integration.
Secure Communication: TLS certificates for domain and API access.


main.tf Breakdown

Here are the main aspects of the main.tf file:

Module Invocation

module "rke2_prod_cluster" { source = "./modules/rke2-provisioner" env = "prod" # Environment name domain = var.domain # Domain name multi_az = true # If you want to create multi-az cluster install_rke2 = true # Install RKE2 lh_storage = true # Local storage for worker nodes hashed_pass = var.hashed_pass # Hashed password for user creation cluster_cidr = var.cluster_cidr # Kubernetes cluster CIDR service_cidr = var.service_cidr # Kubernetes service CIDR nfs_enabled = false # Change to true if you want to enable nfs server update_apt = false # Update apt packages by changing to true rke2_token = var.rke2_token rke2_version = "v1.30.5+rke2r1" rke2_cni = "canal" # Alternatives: flannel, calico, cilium kubevip_range_global = join("-", [cidrhost(var.vm_cidr_az1, 50)], [cidrhost(var.vm_cidr_az1, 60)]) # Global IP range for LoadBalancer IPs kubevip_alb_cidr = "${cidrhost(var.vm_cidr_az1, 20)}/32" # IP for Nginx Ingress Controller Service rke2_api_endpoint = cidrhost(var.vm_cidr_az1, 10) # API Server IP ansible_password = var.ansible_password # Ansible user password domain_crt = var.domain_crt # Domain certificate domain_key = var.domain_key # Domain key domain_root_crt = var.domain_root_crt # Root certificate master_node_count = var.master_node_count_prod worker_node_count = var.worker_node_count_prod storage_node_count = var.storage_node_count_prod # Resources worker_node_cpus = 8 worker_node_memory = 8192 worker_node_disk_size = 100 master_node_cpus = 8 master_node_memory = 8192 master_node_disk_size = 50 storage_node_disk_size = 100 nfs_node_disk_size = 50 # AZ1 master_ip_range_az1 = [for i in range(61, 69) : cidrhost(local.vm_cidr_az1, i)] # Master node IP range worker_ip_range_az1 = [for i in range(71, 79) : cidrhost(local.vm_cidr_az1, i)] # Worker node IP range vsphere_datacenter_az1 = var.vsphere_datacenter_az1 # vSphere datacenter name vsphere_host_az1 = var.vsphere_host_az1 # vSphere host name vsphere_resource_pool_az1 = var.vsphere_resource_pool_az1 # vSphere resource pool name vsphere_datastore_az1 = var.vsphere_datastore_az1 # vSphere datastore name vsphere_network_name_az1 = var.vsphere_network_name_az1 # vSphere network name vm_gw_ip_az1 = local.vm_gw_ip_az1 # Gateway IP nfs_ip_az1 = cidrhost(local.vm_cidr_az1, 70) # NFS server IP # AZ3 master_ip_range_az3 = [for i in range(81, 89) : cidrhost(local.vm_cidr_az3, i)] worker_ip_range_az3 = [for i in range(91, 99) : cidrhost(local.vm_cidr_az3, i)] vsphere_datacenter_az3 = var.vsphere_datacenter_az3 vsphere_host_az3 = var.vsphere_host_az3 vsphere_resource_pool_az3 = var.vsphere_resource_pool_az3 vsphere_datastore_az3 = var.vsphere_datastore_az3 vsphere_network_name_az3 = var.vsphere_network_name_az3 vm_gw_ip_az3 = local.vm_gw_ip_az3 } 
Enter fullscreen mode Exit fullscreen mode

Deployment Walkthrough

Step 1: Initialize Terraform

Run the following commands to initialize Terraform and apply the configuration:

terraform init terraform plan terraform apply 
Enter fullscreen mode Exit fullscreen mode

Step 2: Verify Resources in vSphere

Confirm the VMs are provisioned in vSphere.
Ensure the network configurations (IP, gateway) match the Terraform parameters.

Step 3: Validate Kubernetes Cluster

After deployment:
SSH into one of the master nodes.
Run kubectl get nodes to ensure all nodes are registered and ready.


Screenshots of the Process

Terraform Apply Output

Image description

vSphere Dashboard

Image description

Image description

Kubernetes Terminal

Image description


Lessons Learned

Key Challenges

Configuring multi-AZ setups required precise IP allocation and resource planning.
Ensuring compatibility between Terraform, vSphere, and RKE2 versions.

Tips for Success

Automate Certificate Management: Pre-generate and verify certificates for secure communication.
Test Locally: Run initial setups in a test environment to validate module behavior.
Optimize Resource Allocation: Tailor resource parameters to your workload needs.


Conclusion

Using Terraform and vSphere to deploy an RKE2 Kubernetes cluster offers a highly customizable and scalable solution for production environments. By modularizing the Terraform configuration, this setup can be reused and extended for other environments with minimal changes.
If you've followed along or have feedback, share your experience in the comments below. Checkout the code repository from my GitHub profile. Let's discuss Kubernetes automation at scale!

Top comments (0)