Dynamically allocate devices to workloads with DRA

Standard

This page explains how to deploy dynamic resource allocation (DRA) workloads on your Google Kubernetes Engine clusters. On this page, you'll create a ResourceClaimTemplate to request hardware with DRA and then deploy a basic workload to demonstrate how Kubernetes flexibly allocates hardware on your Pods.

This page is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).

About dynamic resource allocation

DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. For more information, see About dynamic resource allocation.

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.

For more information about ResourceClaim and ResourceClaimTemplate objects, see When to use ResourceClaims and ResourceClaimTemplates.

The examples on this page use a basic ResourceClaimTemplate to request the specified device configuration. For more detailed information, see the ResourceClaimTemplateSpec Kubernetes documentation.

Limitations

Node auto-provisioning isn't supported.
Autopilot clusters don't support DRA.
You can't use the following GPU sharing features:
- Time-sharing GPUs
- Multi-instance GPUs
- Multi-process Service (MPS)

Requirements

To use DRA, your GKE version must be version 1.32.1-gke.1489001 or later.

You should also be familiar with the following requirements and limitations:

Before you begin

Before you start, make sure you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region and compute/zone properties. By setting default locations, you can avoid errors in gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location.

Ensure that your GKE clusters are configured for DRA workloads.

Use DRA to deploy workloads

To request per-Pod device allocation, you first create a ResourceClaimTemplate that produces a ResourceClaim to describe your request for GPUs or TPUs, which Kubernetes uses as a template to create new ResourceClaim objects for each Pod in a workload. When you specify the ResourceClaimTemplate in a workload, Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.

GPU

Save the following manifest as claim-template.yaml:

apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaimTemplate metadata:  name: gpu-claim-template spec:  spec:  devices:  requests:  - name: single-gpu  deviceClassName: gpu.nvidia.com  allocationMode: ExactCount  count: 1

Create the ResourceClaimTemplate:
```
kubectl create -f claim-template.yaml 
```

To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-gpu-example.yaml:

apiVersion: apps/v1 kind: Deployment metadata:  name: dra-gpu-example spec:  replicas: 1  selector:  matchLabels:  app: dra-gpu-example  template:  metadata:  labels:  app: dra-gpu-example  spec:  containers:  - name: ctr  image: ubuntu:22.04  command: ["bash", "-c"]  args: ["while [ 1 ]; do date; echo $(nvidia-smi -L || echo Waiting...); sleep 60; done"]  resources:  claims:  - name: single-gpu  resourceClaims:  - name: single-gpu  resourceClaimTemplateName: gpu-claim-template  tolerations:  - key: "nvidia.com/gpu"  operator: "Exists"  effect: "NoSchedule"

Deploy the workload:
```
kubectl create -f dra-gpu-example.yaml 
```

TPU

Save the following manifest as claim-template.yaml:

apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaimTemplate metadata:  name: tpu-claim-template spec:  spec:  devices:  requests:  - name: all-tpus  deviceClassName: tpu.google.com  allocationMode: All

This ResourceClaimTemplate requests that GKE allocate an entire TPU node pool to every ResourceClaim.

Create the ResourceClaimTemplate:
```
kubectl create -f claim-template.yaml 
```

To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-tpu-example.yaml:

apiVersion: apps/v1 kind: Deployment metadata:  name: dra-tpu-example spec:  replicas: 1  selector:  matchLabels:  app: dra-tpu-example  template:  metadata:  labels:  app: dra-tpu-example  spec:  containers:  - name: ctr  image: ubuntu:22.04  command:  - /bin/sh  - -c  - |  echo "Environment Variables:"  env  echo "Sleeping indefinitely..."  sleep infinity  resources:  claims:  - name: all-tpus  resourceClaims:  - name: all-tpus  resourceClaimTemplateName: tpu-claim-template  tolerations:  - key: "google.com/tpu"  operator: "Exists"  effect: "NoSchedule"

Deploy the workload:
```
kubectl create -f dra-tpu-example.yaml 
```

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod.