Create a GKE Cluster with Pathways

You can use the Accelerated Processing Kit (XPK) to create pre-configured Google Kubernetes Engine (GKE) clusters for Pathways-based workloads. You can also use gcloud to manually create GKE clusters for Pathways-based workloads

Before you begin

Make sure you have:

Set up your local environment

Log in with your Google Cloud credentials.

gcloud auth application-default login 

Define the following environment variables with values appropriate to your workload.

Required variables

Create a GKE cluster

In the following example, you create a cluster with two v5e 2x4 node pools. You can create a cluster using XPK or the gcloud command.

XPK

  1. Set some environment variables

    CLUSTER_NODEPOOL_COUNT=CLUSTER_NODEPOOL_COUNT PROJECT=PROJECT_ID ZONE=ZONE CLUSTER=GKE_CLUSTER_NAME TPU_TYPE="v5litepod-8" PW_CPU_MACHINE_TYPE="n2-standard-64" NETWORK=NETWORK SUBNETWORK=SUB_NETWORK

    Replace the following:

    • CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
    • PROJECT_ID: your Google Cloud project name
    • ZONE: the zone where you are creating resources
    • CLUSTER: the GKE cluster name
    • TPU_TYPE: the TPU type. For more information, see supported types in XPK
    • PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
    • NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
    • SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
  2. Use XPK to create a GKE Pathways cluster. This command can take several minutes to provision the capacity. Once completed, your capacity is allocated and you will start incurring charges.

    xpk cluster create-pathways \ --num-slices=${CLUSTER_NODEPOOL_COUNT} \ --tpu-type=${TPU_TYPE} \ --pathways-gce-machine-type=${PW_CPU_MACHINE_TYPE} \ --on-demand \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster=${CLUSTER} \ --custom-cluster-arguments="--network=${NETWORK} --subnetwork=${SUBNETWORK} --enable-ip-alias"

Once the cluster is created, you can create and delete workloads as needed. You don't need to re-provision the TPU capacity.

gcloud

  1. Set some environment variables

    CLUSTER=GKE_CLUSTER_NAME PROJECT=PROJECT_ID ZONE=ZONE REGION=REGION CLUSTER_VERSION=GKE_CLUSTER_VERSION PW_CPU_MACHINE_TYPE="n2-standard-64" NETWORK=NETWORK SUBNETWORK=SUB_NETWORK CLUSTER_NODEPOOL_COUNT=3 TPU_MACHINE_TYPE="ct5lp-hightpu-4t" WORKERS_PER_SLICE=2 TOPOLOGY="2x4" NUM_CPU_NODES=1

    Replace the following:

    • CLUSTER: the GKE cluster name
    • PROJECT_ID: your Google Cloud project name
    • ZONE: the zone where you are creating resources
    • REGION: the region where you are creating resources
    • CLUSTER_VERSION: [Optional] the GKE cluster version, use 1.32.2-gke.1475000 or later
    • PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
    • NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
    • SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
    • CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
    • TPU_MACHINE_TYPE: the TPU machine type you want to use
    • WORKERS_PER_SLICE: the number of nodes per node pool

    • GKE_ACCELERATOR_TYPE: the Google Kubernetes Engine accelerator type, see Choose a TPU version

    • TOPOLOGY: the TPU topology

    • NUM_CPU_NODES: the Pathways CPU node pool size

The following steps explain how to create a GKE cluster and set it up for running Pathways workloads.

  1. Create a GKE cluster:

    gcloud beta container clusters create ${CLUSTER} \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster-version=${CLUSTER_VERSION} \ --scopes=storage-full,gke-default,cloud-platform \ --machine-type ${PW_CPU_MACHINE_TYPE} \ --network=${NETWORK} \ --subnetwork=${SUBNETWORK} 
  2. Create TPU node pools:

    for i in $(seq 1 ${CLUSTER_NODEPOOL_COUNT}); do gcloud container node-pools create "tpu-np-${i}" \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster=${CLUSTER} \ --machine-type=${TPU_MACHINE_TYPE} \ --num-nodes=${WORKERS_PER_SLICE} \ --placement-type=COMPACT \ --tpu-topology=${TOPOLOGY} \ --scopes=storage-full,gke-default,cloud-platform \ --workload-metadata=GCE_METADATA done 
  3. Create a CPU node pool:

    gcloud container node-pools create "cpu-pathways-np" \ --project ${PROJECT} \ --zone ${ZONE} \ --cluster ${CLUSTER} \ --machine-type ${PW_CPU_MACHINE_TYPE} \ --num-nodes ${NUM_CPU_NODES} \ --scopes=storage-full,gke-default,cloud-platform \ --workload-metadata=GCE_METADATA 
  4. Install the JobSet and PathwaysJob APIs

    Get credentials for the cluster and add them to your local kubectl context.

    gcloud container clusters get-credentials ${CLUSTER} \  [--zone=${ZONE} | --region=${REGION}] \  --project=${PROJECT} \  && kubectl config set-context --current --namespace=default 

    To use the Pathways architecture on your GKE cluster, you need to install the JobSet API and the PathwaysJob API.

    kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.2/install.yaml 

What's next