Skip to content

Commit 89d3c5a

Browse files
authored
Merge pull request intel#1149 from eero-t/gpu-reqs
Add GPU plugin README prerequisites section
2 parents d491f46 + 9b3ee06 commit 89d3c5a

File tree

1 file changed

+78
-5
lines changed

1 file changed

+78
-5
lines changed

cmd/gpu_plugin/README.md

Lines changed: 78 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ Table of Contents
55
* [Introduction](#introduction)
66
* [Modes and Configuration Options](#modes-and-configuration-options)
77
* [Installation](#installation)
8+
* [Prerequisites](#prerequisites)
9+
* [Drivers for discrete GPUs](#drivers-for-discrete-gpus)
10+
* [Kernel driver](#kernel-driver)
11+
* [User-space drivers](#user-space-drivers)
12+
* [Drivers for older (integrated) GPUs](#drivers-for-older-integrated-gpus)
813
* [Pre-built Images](#pre-built-images)
914
* [Install to all nodes](#install-to-all-nodes)
1015
* [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd)
@@ -19,7 +24,8 @@ Table of Contents
1924
## Introduction
2025

2126
Intel GPU plugin facilitates Kubernetes workload offloading by providing access to
22-
discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU device files.
27+
discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU devices
28+
supported by the host kernel.
2329

2430
Use cases include, but are not limited to:
2531
- Media transcode
@@ -50,6 +56,73 @@ The following sections detail how to obtain, build, deploy and test the GPU devi
5056

5157
Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis.
5258

59+
### Prerequisites
60+
61+
Access to a GPU device requires firmware, kernel and user-space
62+
drivers supporting it. Firmware and kernel driver need to be on the
63+
host, user-space drivers in the GPU workload containers.
64+
65+
Intel GPU devices supported by the current kernel can be listed with:
66+
```
67+
$ grep i915 /sys/class/drm/card?/device/uevent
68+
/sys/class/drm/card0/device/uevent:DRIVER=i915
69+
/sys/class/drm/card1/device/uevent:DRIVER=i915
70+
```
71+
72+
#### Drivers for discrete GPUs
73+
74+
##### Kernel driver
75+
76+
For now, kernel needs to be built from sources. Later on there will
77+
also be pre-built kernels and/or DKMS GPU module distro packages for
78+
the enterprise / long-term-support kernels.
79+
80+
While last 5.x upstream Linux kernel releases already had preliminary
81+
discrete Intel GPU support, one should really use kernel v6.x.
82+
83+
In upstream kernels, discrete GPU support needs to be enabled with kernel
84+
`i915.force_probe=<PCI_ID>` command line option until relevant kernel
85+
driver features have been completed in upstream:
86+
https://www.kernel.org/doc/html/latest/gpu/rfc/index.html
87+
88+
PCI IDs for the Intel GPUs on given host can be listed with:
89+
```
90+
$ lspci | grep -e VGA -e Display | grep Intel
91+
88:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
92+
8d:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
93+
```
94+
95+
(`lspci` lists GPUs with display support as "VGA compatible controller",
96+
and server GPUs without display support, as "Display controller".)
97+
98+
Mesa "Iris" 3D driver header provides a mapping between GPU PCI IDs and their Intel brand names:
99+
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/include/pci_ids/iris_pci_ids.h
100+
101+
If your kernel build does not find the correct firmware version for
102+
a given GPU from the host (see `dmesg | grep i915` output), latest
103+
firmware versions are available in upstream:
104+
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
105+
106+
##### User-space drivers
107+
108+
Until new enough user-space drivers (supporting also discrete GPUs)
109+
are available directly from distribution package repositories, they
110+
can be installed to containers from Intel package repositories. See:
111+
https://dgpu-docs.intel.com/installation-guides/index.html
112+
113+
Example container is listed in [Testing and demos](#testing-and-demos).
114+
115+
Validation status against *upstream* kernel is listed in the user-space drivers release notes:
116+
* Media driver: https://github.com/intel/media-driver/releases
117+
* Compute driver: https://github.com/intel/compute-runtime/releases
118+
119+
#### Drivers for older (integrated) GPUs
120+
121+
For the older (integrated) GPUs, new enough firmware and kernel driver
122+
are typically included already with the host OS, and new enough
123+
user-space drivers (for the GPU containers) are in the host OS
124+
repositories.
125+
53126
### Pre-built Images
54127

55128
[Pre-built images](https://hub.docker.com/r/intel/intel-gpu-plugin)
@@ -155,8 +228,8 @@ master
155228
## Testing and Demos
156229

157230
We can test the plugin is working by deploying an OpenCL image and running `clinfo`.
158-
The sample OpenCL image can be built using `make intel-opencl-icd` and must be made
159-
available in the cluster.
231+
[intel-opencl-icd](../../demo/intel-opencl-icd/) sample OpenCL image, built using
232+
`make intel-opencl-icd` and available from DockerHub, is used for this.
160233

161234
1. Create a job:
162235

@@ -174,8 +247,8 @@ available in the cluster.
174247
<log output>
175248
```
176249
177-
If the pod did not successfully launch, possibly because it could not obtain the gpu
178-
resource, it will be stuck in the `Pending` status:
250+
If the pod did not successfully launch, possibly because it could not obtain
251+
the requested GPU resource, it will be stuck in the `Pending` status:
179252
180253
```bash
181254
$ kubectl get pods

0 commit comments

Comments
 (0)