Warning

The Network Operator Release Notes chapter is available here.

NVIDIA Network Operator leverages Kubernetes CRDs and Operator SDK to manage networking related components in order to enable fast networking, RDMA and GPUDirect for workloads in a Kubernetes cluster. The Network Operator works in conjunction with the GPU-Operator to enable GPU-Direct RDMA on compatible systems.

The goal of the Network Operator is to manage the networking related components, while enabling execution of RDMA and GPUDirect RDMA workloads in a Kubernetes cluster. This includes:

NVIDIA Networking drivers to enable advanced features
Kubernetes device plugins to provide hardware resources required for an accelerated network
Kubernetes secondary network components for network intensive workloads

Prerequisites

You have the kubectl and helm CLIs available on a client machine.

You can run the following commands to install the Helm CLI:

Copy
Copied!
  

 
 $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ && chmod 700 get_helm.sh \ && ./get_helm.sh

Nodes must be configured with a container engine such CRI-O or containerd.
If your cluster uses Pod Security Admission (PSA) to restrict the behavior of pods, label the namespace for the Operator to set the enforcement policy to privileged:
Copy

Copied!
```
 
 $ kubectl create ns nvidia-network-operator $ kubectl label --overwrite ns nvidia-network-operator pod-security.kubernetes.io/enforce=privileged 
```
Node Feature Discovery (NFD) is a dependency for the Operator on each node. By default, NFD master and worker are automatically deployed by the Operator. If NFD is already running in the cluster, then you must disable deploying NFD when you install the Operator. by setting nfd.enabled=false Helm value

One way to determine if NFD is already running in the cluster is to check for a NFD label on your nodes:
Copy

Copied!
```
 
 $ kubectl get nodes -o json | jq '.items[].metadata.labels | keys | any(startswith("feature.node.kubernetes.io"))' 
```
If the command output is true, then NFD is already running in the cluster.

Note

NFD needs to support NodeFeatureRules API or it should be configured to expose the needed NIC labels. Deploying NFD from either NVIDIA Network Operator or NVIDIA GPU Operator will have the correct configurations for both Operators.

Network Operator Deployment on Vanilla Kubernetes Cluster

Warning

It is recommended to have dedicated control plane nodes for Vanilla Kubernetes deployments with NVIDIA Network Operator.

The default installation via Helm as described below will deploy the Network Operator and related CRDs, after which an additional step is required to create a NicClusterPolicy custom resource with the configuration that is desired for the cluster.

For more information on NicClusterPolicy custom resource, please refer to the Network-Operator Project Sources.

The provided Helm chart contains various parameters to facilitate the creation of a NicClusterPolicy custom resource upon deployment.

Warning

Each Network Operator Release has a set of default version values for the various components it deploys. It is recommended that these values will not be changed. Testing and validation were performed with these values, and there is no guarantee of interoperability nor correctness when different versions are used.

Add NVIDIA NGC Helm repository

Copy
Copied!
  

 
 helm repo add nvidia https://helm.ngc.nvidia.com/nvidia

Update Helm repositories

Copy
Copied!
  

 
 helm repo update

Install Network Operator from the NVIDIA NGC chart using the default values:

Copy
Copied!
  

 
  helm install network-operator nvidia/network-operator \ -n nvidia-network-operator \ --create-namespace \ --version v25.4.0 \ --wait

View deployed resources

Copy
Copied!
  

 
 kubectl -n nvidia-network-operator get pods

OR install the Network Operator from the NVIDIA NGC chart using custom values:

Warning

Since several parameters should be provided when creating custom resources during operator deployment, it is recommended to use a configuration file. While it is possible to override the parameters via CLI, we recommend to avoid the use of CLI arguments in favor of a configuration file.

Copy
Copied!
  

 
 helm show values nvidia/network-operator --version v25.4.0 > values.yaml

Install with specifying the custom values.yaml

Copy
Copied!
  

 
  helm install network-operator nvidia/network-operator \ -n nvidia-network-operator \ --create-namespace \ --version v25.4.0 \ -f ./values.yaml \ --wait

Deployment Examples

Warning

Below are deployment examples, which the values.yaml file provided to the Helm during the installation of the network operator. This was achieved by running:

Copy
Copied!
  

 
 helm install -f ./values.yaml -n nvidia-network-operator --create-namespace --wait nvidia/network-operator network-operator

Network Operator Deployment with RDMA Shared Device Plugin

First install the Network Operator with NFD enabled:

values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with

DOCA driver
RDMA Shared device plugin configured to a netdev with name ens1f0.

Note: You may need to change the interface names in the NicClusterPolicy to those used by your target nodes.

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  rdmaSharedDevicePlugin:  # [map[ifNames:[ens1f0] name:rdma_shared_device_a]]  image: k8s-rdma-shared-dev-plugin  repository: ghcr.io/mellanox  version: v1.5.3  imagePullSecrets: []  # The config below directly propagates to k8s-rdma-shared-device-plugin configuration.  # Replace 'devices' with your (RDMA capable) netdevice name.  config: |  {  "configList": [  {  "resourceName": "rdma_shared_device_a",  "rdmaHcaMax": 63,  "selectors": {  "vendors": [],  "deviceIDs": [],  "drivers": [],  "ifNames": ["ens1f0"],  "linkTypes": []  }  }  ]  }

Network Operator Deployment with Multiple Resources in RDMA Shared Device Plugin

First install the Network Operator with NFD enabled:

values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
RDMA Shared Device pluging with two RDMA resources - the first mapped to ens1f0 and ens1f1 and the second mapped to ens2f0 and ens2f1.

Note: You may need to change the interface names in the NicClusterPolicy to those used by your target nodes.

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  rdmaSharedDevicePlugin:  # [map[ifNames:[ens1f0 ens1f1] name:rdma_shared_device_a] map[ifNames:[ens2f0 ens2f1] name:rdma_shared_device_b]]  image: k8s-rdma-shared-dev-plugin  repository: ghcr.io/mellanox  version: v1.5.3  imagePullSecrets: []  # The config below directly propagates to k8s-rdma-shared-device-plugin configuration.  # Replace 'devices' with your (RDMA capable) netdevice name.  config: |  {  "configList": [  {  "resourceName": "rdma_shared_device_a",  "rdmaHcaMax": 63,  "selectors": {  "vendors": [],  "deviceIDs": [],  "drivers": [],  "ifNames": ["ens1f0","ens1f1"],  "linkTypes": []  }  },  {  "resourceName": "rdma_shared_device_b",  "rdmaHcaMax": 63,  "selectors": {  "vendors": [],  "deviceIDs": [],  "drivers": [],  "ifNames": ["ens2f0","ens2f1"],  "linkTypes": []  }  }  ]  }

Network Operator Deployment with a Secondary Network

First install the Network Operator with NFD enabled:

values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with the following enabled:

Secondary network
Multus CNI
Container-networking-plugins CNI plugins
IPAM Plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

Network Operator Deployment with NVIDIA-IPAM

First install the Network Operator with NFD enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with the following enabled:

Secondary network
Multus CNI
Container Networking plugins
NVIDIA-IPAM plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  nvIpam:  image: nvidia-k8s-ipam  repository: ghcr.io/mellanox  version: v0.3.7  imagePullSecrets: []  enableWebhook: false

To create an NV-IPAM IPPool, apply:

Copy
Copied!
  

 
 apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata:  name: my-pool  namespace: nvidia-network-operator spec:  subnet: 192.168.0.0/24  perNodeBlockSize: 100  gateway: 192.168.0.1

Example of a MacvlanNetwork that uses NVIDIA-IPAM:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: MacvlanNetwork metadata:  name: example-macvlannetwork spec:  networkNamespace: "default"  master: "ens2f0"  mode: "bridge"  mtu: 1500  ipam: |  {  "type": "nv-ipam",  "poolName": "my-pool"  }

Network Operator Deployment with a Host Device Network

In this mode, the Network Operator could be deployed on virtualized deployments as well. It supports both Ethernet and InfiniBand modes. From the Network Operator perspective, there is no difference between the deployment procedures. To work on a VM (virtual machine), the PCI passthrough must be configured for SR-IOV devices. The Network Operator works both with VF (Virtual Function) and PF (Physical Function) inside the VMs.

Warning

If the Host Device Network is used without the DOCA Driver, the following packages should be installed:

the linux-generic package on Ubuntu hosts
the kernel-modules-extra package on the RedHat-based hosts

First install the Network Operator with NFD enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

SR-IOV device plugin configured with a single SR-IOV resource pool
Secondary network
Multus CNI
Container Networking plugins
IPAM plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  sriovDevicePlugin:  image: sriov-network-device-plugin  repository: ghcr.io/k8snetworkplumbingwg  version: v3.9.0  imagePullSecrets: []  config: |  {  "resourceList": [  {  "resourcePrefix": "nvidia.com",  "resourceName": "hostdev",  "selectors": {  "vendors": ["15b3"],  "devices": [],  "drivers": [],  "pfNames": [],  "pciAddresses": [],  "rootDevices": [],  "linkTypes": [],  "isRdma": true  }  }  ]  }  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

Following the deployment, the network operator should be configured, and K8s networking should be deployed to use it in pod configuration.

The host-device-net.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata:  name: hostdev-net spec:  networkNamespace: "default"  resourceName: "hostdev"  ipam: |  {  "type": "whereabouts",  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "range": "192.168.3.225/28",  "exclude": [  "192.168.3.229/30",  "192.168.3.236/32"  ],  "log_file": "/var/log/whereabouts.log",  "log_level": "info"  }

The host-device-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata:  name: hostdev-net spec:  networkNamespace: "default"  resourceName: "hostdev"  ipam: |  {  "type": "whereabouts",  "range": "192.168.3.225/28",  "exclude": [  "192.168.3.229/30",  "192.168.3.236/32"  ]  }

The pod.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: hostdev-test-pod  annotations:  k8s.v1.cni.cncf.io/networks: hostdev-net spec:  restartPolicy: OnFailure  containers:  - image:  name: doca-test-ctr  securityContext:  capabilities:  add: [ "IPC_LOCK" ]  resources:  requests:  nvidia.com/hostdev: 1  limits:  nvidia.com/hostdev: 1  command:  - sh  - -c  - sleep inf

Network Operator Deployment with a Host Device Network and Macvlan Network

In this combined deployment, different NVIDIA NICs are used for RDMA Shared Device Plugin and SR-IOV Network Device Plugin in order to work with a Host Device Network or a Macvlan Network on different NICs. It is impossible to combine different networking types on the same NICs. The same principle should be applied for other networking combinations.

First install the Network Operator with NFD enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed deploy a NicClusterPolicy with:

RDMA shared device plugin with
SR-IOV device plugin, single SR-IOV resource pool
Secondary network
Multus CNI
Container-networking-plugins CNI plugins
RDMA Shared device plugin
Whereabouts IPAM CNI plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  rdmaSharedDevicePlugin:  # [map[linkTypes:[ether] name:rdma_shared_device_a]]  image: k8s-rdma-shared-dev-plugin  repository: ghcr.io/mellanox  version: v1.5.3  imagePullSecrets: []  # The config below directly propagates to k8s-rdma-shared-device-plugin configuration.  # Replace 'devices' with your (RDMA capable) netdevice name.  config: |  {  "configList": [  {  "resourceName": "rdma_shared_device_a",  "rdmaHcaMax": 63,  "selectors": {  "vendors": [],  "deviceIDs": [],  "drivers": [],  "ifNames": [],  "linkTypes": ["ether"]  }  }  ]  }  sriovDevicePlugin:  image: sriov-network-device-plugin  repository: ghcr.io/k8snetworkplumbingwg  version: v3.9.0  imagePullSecrets: []  config: |  {  "resourceList": [  {  "resourcePrefix": "nvidia.com",  "resourceName": "hostdev",  "selectors": {  "vendors": [],  "devices": [],  "drivers": [],  "pfNames": [],  "pciAddresses": [],  "rootDevices": [],  "linkTypes": ["IB"],  "isRdma": true  }  }  ]  }  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

For pods and network configuration examples please refer to the corresponding sections: Network Operator Deployment with the RDMA Shared Device Plugin and Network Operator Deployment with a Host Device Network.

Network Operator Deployment with an IP over InfiniBand (IPoIB) Network

First install the Network Operator with NFD enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
RDMA shared device plugin
Secondary network
Multus CNI
IPoIB CNI
Whereabouts IPAM CNI plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  rdmaSharedDevicePlugin:  # [map[ifNames:[ibs1f0] name:rdma_shared_device_a]]  image: k8s-rdma-shared-dev-plugin  repository: ghcr.io/mellanox  version: v1.5.3  imagePullSecrets: []  # The config below directly propagates to k8s-rdma-shared-device-plugin configuration.  # Replace 'devices' with your (RDMA capable) netdevice name.  config: |  {  "configList": [  {  "resourceName": "rdma_shared_device_a",  "rdmaHcaMax": 63,  "selectors": {  "vendors": [],  "deviceIDs": [],  "drivers": [],  "ifNames": ["ibs1f0"],  "linkTypes": []  }  }  ]  }  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipoib:  image: ipoib-cni  repository: ghcr.io/mellanox  version: 428715a57c0b633e48ec7620f6e3af6863149ccf  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

Following the deployment, the network operator should be configured, and K8s networking deployed to use it in the pod configuration.

The ipoib-net.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: IPoIBNetwork metadata:  name: example-ipoibnetwork spec:  networkNamespace: "default"  master: "ibs1f0"  ipam: |  {  "type": "whereabouts",  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "range": "192.168.5.225/28",  "exclude": [  "192.168.6.229/30",  "192.168.6.236/32"  ],  "log_file" : "/var/log/whereabouts.log",  "log_level" : "info",  "gateway": "192.168.6.1"  }

The ipoib-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: IPoIBNetwork metadata:  name: example-ipoibnetwork spec:  networkNamespace: "default"  master: "ibs1f0"  ipam: |  {  "type": "whereabouts",  "range": "192.168.5.225/28",  "exclude": [  "192.168.6.229/30",  "192.168.6.236/32"  ]  }

The pod.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: iboip-test-pod  annotations:  k8s.v1.cni.cncf.io/networks: example-ipoibnetwork spec:  restartPolicy: OnFailure  containers:  - image:  name: doca-test-ctr  securityContext:  capabilities:  add: [ "IPC_LOCK" ]  resources:  requests:  rdma/rdma_shared_device_a: 1  limits:  edma/rdma_shared_device_a: 1  command:  - sh  - -c  - sleep inf

Network Operator Deployment for GPUDirect Workloads

GPUDirect requires the following:

NVIDIA DOCA Driver v5.5-1.0.3.2 or newer
GPU Operator v1.9.0 or newer
NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or NVIDIA T4/NVIDIA V100/NVIDIA A100

First install the Network Operator with NFD enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
SR-IOV Device Plugin
Secondary network
Multus CNI
Container Networking plugins
IPAM plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  sriovDevicePlugin:  image: sriov-network-device-plugin  repository: ghcr.io/k8snetworkplumbingwg  version: v3.9.0  imagePullSecrets: []  config: |  {  "resourceList": [  {  "resourcePrefix": "nvidia.com",  "resourceName": "hostdev",  "selectors": {  "vendors": ["15b3"],  "devices": [],  "drivers": [],  "pfNames": [],  "pciAddresses": [],  "rootDevices": [],  "linkTypes": [],  "isRdma": true  }  }  ]  }  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

host-device-net.yaml:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata:  name: hostdevice-net spec:  networkNamespace: "default"  resourceName: "hostdev"  ipam: |  {  "type": "whereabouts",  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "range": "192.168.3.225/28",  "exclude": [  "192.168.3.229/30",  "192.168.3.236/32"  ],  "log_file" : "/var/log/whereabouts.log",  "log_level" : "info"  }

The host-device-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata:  name: hostdevice-net spec:  networkNamespace: "default"  resourceName: "hostdev"  ipam: |  {  "type": "whereabouts",  "range": "192.168.3.225/28",  "exclude": [  "192.168.3.229/30",  "192.168.3.236/32"  ]  }

host-net-gpudirect-pod.yaml

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: testpod1  annotations:  k8s.v1.cni.cncf.io/networks: hostdevice-net spec:  containers:  - name: appcntr1  image: <image>  imagePullPolicy: IfNotPresent  securityContext:  capabilities:  add: ["IPC_LOCK"]  command:  - sh  - -c  - sleep inf  resources:  requests:  nvidia.com/hostdev: '1'  nvidia.com/gpu: '1'  limits:  nvidia.com/hostdev: '1'  nvidia.com/gpu: '1'

Network Operator Deployment in SR-IOV Legacy Mode

Warning

The SR-IOV Network Operator will be deployed with the default configuration. You can override these settings using a CLI argument, or the ‘sriov-network-operator’ section in the values.yaml file. For more information, refer to the Project Documentation.

Warning

This deployment mode supports SR-IOV in legacy mode.

First install the Network Operator with NFD and SRIOV Network Operator enabled: values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true sriovNetworkOperator:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
Secondary network
Multus CNI
IPoIB CNI
IPAM CNI plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

Following the deployment, the Network Operator should be configured, and sriovnetwork node policy and K8s networking should be deployed.

The sriovnetwork-node-policy.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: policy-1  namespace: nvidia-network-operator spec:  deviceType: netdevice  mtu: 1500  nicSelector:  vendor: "15b3"  pfNames: ["ens2f0"]  nodeSelector:  feature.node.kubernetes.io/pci-15b3.present: "true"  numVfs: 8  priority: 90  isRdma: true  resourceName: sriov_resource

The sriovnetwork.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata:  name: "example-sriov-network"  namespace: nvidia-network-operator spec:  vlan: 0  networkNamespace: "default"  resourceName: "sriov_resource"  ipam: |-  {  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "log_file": "/tmp/whereabouts.log",  "log_level": "debug",  "type": "whereabouts",  "range": "192.168.101.0/24"  }

Warning

The ens2f0 network interface name has been chosen from the following command output: kubectl -n nvidia-network-operator get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml.

Copy
Copied!
  

 
 ... status:  interfaces:  - deviceID: 101d  driver: mlx5_core  linkSpeed: 100000 Mb/s  linkType: ETH  mac: 0c:42:a1:2b:74:ae  mtu: 1500  name: ens2f0  pciAddress: "0000:07:00.0"  totalvfs: 8  vendor: 15b3  - deviceID: 101d  driver: mlx5_core  linkType: ETH  mac: 0c:42:a1:2b:74:af  mtu: 1500  name: ens2f1  pciAddress: "0000:07:00.1"  totalvfs: 8  vendor: 15b3 ...

Wait for all required pods to be spawned:

Copy
Copied!
  

 
 # kubectl get pod -n nvidia-network-operator | grep sriov network-operator-sriov-network-operator-544c8dbbb9-vzkmc 1/1 Running 0 5d sriov-device-plugin-vwpzn 1/1 Running 0 2d6h sriov-network-config-daemon-qv467 3/3 Running 0 5d # kubectl get pod -n nvidia-network-operator NAME READY STATUS RESTARTS AGE cni-plugins-ds-kbvnm 1/1 Running 0 5d cni-plugins-ds-pcllg 1/1 Running 0 5d kube-multus-ds-5j6ns 1/1 Running 0 5d kube-multus-ds-mxgvl 1/1 Running 0 5d mofed-ubuntu20.04-ds-2zzf4 1/1 Running 0 5d mofed-ubuntu20.04-ds-rfnsw 1/1 Running 0 5d whereabouts-nw7hn 1/1 Running 0 5d whereabouts-zvhrv 1/1 Running 0 5d ...

The pod.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: testpod1  annotations:  k8s.v1.cni.cncf.io/networks: example-sriov-network spec:  containers:  - name: appcntr1  image: <image>  imagePullPolicy: IfNotPresent  securityContext:  capabilities:  add: ["IPC_LOCK"]  resources:  requests:  nvidia.com/sriov_resource: '1'  limits:  nvidia.com/sriov_resource: '1'  command:  - sh  - -c  - sleep inf

Warning

This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

To apply SR-IOV configuration on several nodes in parallel, create a SriovNetworkPoolConfig CR and specify the maximum number or percentage of nodes that can be unavailable at the same time:

sriov-network-pool-config-number.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata:  name: pool-1  namespace: nvidia-network-operator spec:  maxUnavailable: "20"  nodeSelector:  - matchExpressions:  - key: some-label  operator: In  values:  - val-2  - matchExpressions:  - key: other-label  operator: "Exists"

sriov-network-pool-config-percent.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata:  name: pool-1  namespace: nvidia-network-operator spec:  maxUnavailable: "10%"  nodeSelector:  - matchExpressions:  - key: some-label  operator: In  values:  - val-2  - matchExpressions:  - key: other-label  operator: "Exists"

Warning

This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

To apply SriovNetworkNodePolicy on several nodes in parallel, specify the featureGates option in the SriovOperatorConfig CRD:

Copy
Copied!
  

 
 kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "featureGates": { "parallelNicConfig": true } } }' --type='merge'

SR-IOV Network Operator Deployment – SR-IOV Using the systemd Service

To enable systemd SR-IOV configuration mode, specify the configurationMode option in the SriovOperatorConfig CRD:

Copy
Copied!
  

 
 kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "configurationMode": "systemd"} }' --type='merge'

Network Operator Deployment with an SR-IOV InfiniBand Network

Network Operator deployment with InfiniBand network requires the following:

NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this article.
InfiniBand device – Both the host device and switch ports must be enabled in InfiniBand mode.
rdma-core package should be installed when an inbox driver is used.

First install the Network Operator with NFD and SR-IOV Network Operator enabled: values.yaml

Copy
Copied!
  

 
 nfd:  enabled: true sriovNetworkOperator:  enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
Secondary network
Multus CNI
Container Networking Plugins
IPAM plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0  imagePullSecrets: []

sriov-ib-network-node-policy.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: infiniband-sriov  namespace: nvidia-network-operator spec:  deviceType: netdevice  mtu: 1500  nodeSelector:  feature.node.kubernetes.io/pci-15b3.present: "true"  nicSelector:  vendor: "15b3"  linkType: IB  isRdma: true  numVfs: 8  priority: 90  resourceName: mlnxnics

sriov-ib-network.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata:  name: example-sriov-ib-network  namespace: nvidia-network-operator spec:  ipam: |  {  "type": "whereabouts",  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "range": "192.168.5.225/28",  "exclude": [  "192.168.5.229/30",  "192.168.5.236/32"  ],  "log_file": "/var/log/whereabouts.log",  "log_level": "info"  }  resourceName: mlnxnics  linkState: enable  networkNamespace: default

sriov-ib-network-pod.yaml

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: test-sriov-ib-pod  annotations:  k8s.v1.cni.cncf.io/networks: example-sriov-ib-network spec:  containers:  - name: test-sriov-ib-pod  image: centos/tools  imagePullPolicy: IfNotPresent  command:  - sh  - -c  - sleep inf  securityContext:  capabilities:  add: [ "IPC_LOCK" ]  resources:  requests:  nvidia.com/mlnxics: "1"  limits:  nvidia.com/mlnxics: "1"

Network Operator Deployment with an SR-IOV InfiniBand Network with PKey Management

Network Operator deployment with InfiniBand network requires the following:

NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this article.
NVIDIA UFM running on top of OpenSM. For more details, please refer to the project documentation.
InfiniBand device – Both the host device and the switch ports must be enabled in InfiniBand mode.
rdma-core package should be installed when an inbox driver is used.

Current limitations:

Only a single PKey can be configured per workload pod.
When a single instance of NVIDIA UFM is used with several K8s clusters, different PKey GUID pools should be configured for each cluster.

Note

ib-kubernetes provides a daemon that works in conjunction with the SR-IOV Network Device Plugin. It acts on Kubernetes pod object changes (Create/Update/Delete), reading the pod’s network annotation, fetching its corresponding network CRD and reading the PKey. This is done in order to add the newly generated GUID or the predefined GUID in the GUID field of the CRD cni-args to that PKey for pods with mellanox.infiniband.app annotation.

Warning

ib-kubernetes-ufm-secret should be created before NicClusterPolicy.

IB Kubernetes must access NVIDIA UFM in order to manage pods’ GUIDs. To provide its credentials, the secret of the following format should be deployed in advance:

ufm-secret.yaml

Copy
Copied!
  

 
 apiVersion: v1 kind: Secret metadata:  name: ib-kubernetes-ufm-secret  namespace: nvidia-network-operator stringData:  UFM_USERNAME: "admin"  UFM_PASSWORD: "123456"  UFM_ADDRESS: "ufm-host"  UFM_HTTP_SCHEMA: ""  UFM_PORT: "" data:  UFM_CERTIFICATE: ""

First install the Network Operator with NFD enabled: values.yaml

Copy
Copied!
  

 
 nfd:  enabled: true sriovNetworkOperator:  enabled: true  resourcePrefix: "nvidia.com"

Once the Network Operator is installed create a NicClusterPolicy with:

DOCA driver
ibKubernetes
Secondary network
Multus CNI
Container Networking plugins
IPAM Plugin

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  forcePrecompiled: false  imagePullSecrets: []  terminationGracePeriodSeconds: 300  startupProbe:  initialDelaySeconds: 10  periodSeconds: 20  livenessProbe:  initialDelaySeconds: 30  periodSeconds: 30  readinessProbe:  initialDelaySeconds: 10  periodSeconds: 30  upgradePolicy:  autoUpgrade: true  maxParallelUpgrades: 1  safeLoad: false  drain:  enable: true  force: true  podSelector: ""  timeoutSeconds: 300  deleteEmptyDir: true  ibKubernetes:  image: ib-kubernetes  repository: ghcr.io/mellanox  version: v1.1.1  imagePullSecrets: []  pKeyGUIDPoolRangeStart: 02:00:00:00:00:00:00:00  pKeyGUIDPoolRangeEnd: 02:FF:FF:FF:FF:FF:FF:FF  ufmSecret: "ib-kubernetes-ufm-secret"  nvIpam:  image: nvidia-k8s-ipam  repository: ghcr.io/mellanox  version: v0.3.7  imagePullSecrets: []  enableWebhook: false  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []

Create IPPool object for nv-ipam

Copy
Copied!
  

 
 apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata:  name: pool1  namespace: nvidia-network-operator spec:  subnet: 192.168.0.0/16  perNodeBlockSize: 100  gateway: 192.168.0.1  nodeSelector:  nodeSelectorTerms:  - matchExpressions:  - key: node-role.kubernetes.io/worker  operator: Exists

Wait for NVIDIA DOCA Driver to install and apply the following CRs:

sriov-ib-network-node-policy.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: infiniband-sriov  namespace: nvidia-network-operator spec:  deviceType: netdevice  mtu: 1500  nodeSelector:  feature.node.kubernetes.io/pci-15b3.present: "true"  nicSelector:  vendor: "15b3"  linkType: IB  isRdma: true  numVfs: 8  priority: 90  resourceName: mlnxnics

sriov-ib-network.yaml

Copy
Copied!
  

 
 apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata:  name: ib-sriov-network  annotations:  k8s.v1.cni.cncf.io/resourceName: nvidia.com/mlnxnics spec:  config: '{  "type":"ib-sriov",  "cniVersion":"0.3.1",  "name":"ib-sriov-network",  "pkey":"0x6",  "link_state":"enable",  "ibKubernetesEnabled":true,  "ipam":{  "type":"nv-ipam",  "poolName":"pool1"  } }'

Note

To use the IB network with Pkey management feature with RDMA isolation, use the following sriov-ib-network.yaml:

Copy
Copied!
  

 
 apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata:  name: ib-sriov-network  annotations:  k8s.v1.cni.cncf.io/resourceName: nvidia.com/mlnxnics spec:  config: '{  "cniVersion":"0.3.1",  "name":"ib-sriov-network",  "plugins":[  {  "type":"ib-sriov",  "pkey":"0x6",  "link_state":"enable",  "ibKubernetesEnabled":true,  "ipam":{  "type":"nv-ipam",  "poolName":"pool1"  }  },  {  "type":"rdma"  }  ] }'

sriov-ib-network-pod.yaml

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: test-sriov-ib-pod  annotations:  k8s.v1.cni.cncf.io/networks: ib-sriob-network spec:  containers:  - name: test-sriov-ib-pod  image: centos/tools  imagePullPolicy: IfNotPresent  command:  - sh  - -c  - sleep inf  securityContext:  capabilities:  add: [ "IPC_LOCK" ]  resources:  requests:  nvidia.com/mlnxics: "1"  limits:  nvidia.com/mlnxics: "1"

Network Operator Deployment for DPDK Workloads with NicClusterPolicy

This deployment mode supports DPDK applications. In order to run DPDK applications, HUGEPAGE should be configured on the required K8s Worker Nodes. By default, the inbox operating system driver is used. For support of cases with specific requirements, DOCA Driver container should be deployed.

Network Operator deployment with:

Host Device Network
DPDK pod

nicclusterpolicy.yaml

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  ofedDriver:  image: doca-driver  repository: nvcr.io/nvidia/mellanox  version: 25.04-0.6.1.0-2  sriovDevicePlugin:  image: sriov-network-device-plugin  repository: ghcr.io/k8snetworkplumbingwg  version: v3.9.0  config: |  {  "resourceList": [  {  "resourcePrefix": "nvidia.com",  "resourceName": "rdma_host_dev",  "selectors": {  "vendors": ["15b3"],  "devices": ["1018"],  "drivers": ["mlx5_core"]  }  }  ]  }  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1-amd64  ipamPlugin:  image: whereabouts  repository: ghcr.io/k8snetworkplumbingwg  version: v0.7.0-amd64  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0

host-device-net.yaml

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata:  name: example-hostdev-net spec:  networkNamespace: "default"  resourceName: "rdma_host_dev"  ipam: |  {  "type": "whereabouts",  "datastore": "kubernetes",  "kubernetes": {  "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"  },  "range": "192.168.3.225/28",  "exclude": [  "192.168.3.229/30",  "192.168.3.236/32"  ],  "log_file" : "/var/log/whereabouts.log",  "log_level" : "info"  }

pod.yaml

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: testpod1  annotations:  k8s.v1.cni.cncf.io/networks: example-hostdev-net spec:  containers:  - name: appcntr1  image: <dpdk image>  imagePullPolicy: IfNotPresent  securityContext:  capabilities:  add: ["IPC_LOCK"]  volumeMounts:  - mountPath: /dev/hugepages  name: hugepage  resources:  requests:  memory: 1Gi  hugepages-1Gi: 2Gi  nvidia.com/rdma_host_dev: '1'  command: [ "/bin/bash", "-c", "--" ]  args: [ "whiletrue;dosleep300000;done;" ]  volumes:  - name: hugepage  emptyDir:  medium: HugePages

Network Operator Deployment and OpenvSwitch offload - managed OpenvSwitch

Warning

This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

Warning

To use DOCA Driver container with this mode of operation, set the RESTORE_DRIVER_ON_POD_TERMINATION environment variable to false in the driver configuration section in the NicClusterPolicy. Restoration to the inbox driver is not supported for this feature.

Warning

Tech Preview feature.

In this mode, the sriov-network-operator automatically creates and configures OpenvSwitch bridges. For more complex scenarios, such as VF lag, you must use the “externally managed OpenvSwitch” feature of the sriov-network-operator, which is detailed in a separate section of the documentation.

Network Operator Configuration

Deploy network-operator by Helm with sriov-network-operator and nv-ipam.

First install the Network Operator with NFD enabled: values.yaml

Copy
Copied!
  

 
 sriovNetworkOperator:  enabled: true

Once the Network Operator has been installed create a NicClusterPolicy with nv-ipam:

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  nvIpam:  image: nvidia-k8s-ipam  repository: ghcr.io/mellanox  version: v0.3.7  imagePullSecrets: []  enableWebhook: false  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []

Enable manageSoftwareBridges featureGate for sriov-network-operator

Copy
Copied!
  

 
 kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "featureGates": { "manageSoftwareBridges": true } } }' --type='merge'

Create IPPool object for nv-ipam

Copy
Copied!
  

 
 apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata:  name: pool1  namespace: nvidia-network-operator spec:  subnet: 192.168.0.0/16  perNodeBlockSize: 100  gateway: 192.168.0.1  nodeSelector:  nodeSelectorTerms:  - matchExpressions:  - key: node-role.kubernetes.io/worker  operator: Exists

Prerequisites for Worker Nodes

Supported operating systems:

Ubuntu 22.04

OpenvSwitch from the NVIDIA DOCA for Host package with doca-all or doca-networking profile should be installed on each worker node.

Check NVIDIA DOCA Official installation guide for details.

Supported OpenvSwitch dataplanes:

OVS-kernel
OVS-doca

Check OpenvSwitch Offload document to know about differences.

OVS-kernel

These steps are for OVS-kernel data plane, to use OVS-doca follow instructions from the relevant section.

Prepare Worker Nodes

Configure Open_vSwitch

Copy
Copied!
  

 
 ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy
Copied!
  

 
 systemctl restart openvswitch-switch.service

Sriov Network Operator Configuration

Create SriovNetworkNodePolicy for selected NIC

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: ovs-switchdev  namespace: nvidia-network-operator spec:  eSwitchMode: switchdev  mtu: 1500  nicSelector:  deviceID: 101d  vendor: 15b3  nodeSelector:  node-role.kubernetes.io/worker: ""  numVfs: 4  isRdma: true  linkType: ETH  resourceName: switchdev  bridge:  ovs: {}

Create OVSNetwork CR

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata:  name: ovs  namespace: nvidia-network-operator spec:  networkNamespace: default  ipam: |  {  "type": "nv-ipam",  "poolName": "pool1"  }  resourceName: switchdev

OVS-doca

These steps are for OVS-doca data plane, to use OVS-kernel follow instructions from the relevant section.

Prepare Worker Nodes

Configure hugepages

Copy
Copied!
  

 
 mkdir -p /hugepages mount -t hugetlbfs hugetlbfs /hugepages echo 4096 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Note: for multi CPU system hugepages should be created for each NUMA node: node0, node1, …

Configure system to create hugepages on boot

Copy
Copied!
  

 
 echo "vm.nr_hugepages=8192" > /etc/sysctl.d/99-hugepages.conf

Note: this example is for a server with two CPU

Configure Open_vSwitch

Copy
Copied!
  

 
 ovs-vsctl --no-wait set Open_vSwitch . other_config:doca-init=true ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy
Copied!
  

 
 systemctl restart openvswitch-switch.service

Sriov Network Operator Configuration

Create SriovNetworkNodePolicy for selected NIC

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: ovs-switchdev  namespace: nvidia-network-operator spec:  eSwitchMode: switchdev  mtu: 1500  nicSelector:  deviceID: 101d  vendor: 15b3  nodeSelector:  node-role.kubernetes.io/worker: ""  numVfs: 4  isRdma: true  linkType: ETH  resourceName: switchdev  bridge:  ovs:  bridge:  datapathType: netdev  uplink:  interface:  type: dpdk

Create OVSNetwork CR

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata:  name: ovs  namespace: nvidia-network-operator spec:  networkNamespace: default  ipam: |  {  "type": "nv-ipam",  "poolName": "pool1"  }  resourceName: switchdev  interfaceType: dpdk

Test Workload

Copy
Copied!
  

 
 apiVersion: apps/v1 kind: Deployment metadata:  name: ovs-offload  labels:  app: ovs-offload spec:  replicas: 2  selector:  matchLabels:  app: ovs-offload  template:  metadata:  labels:  app: ovs-offload  annotations:  k8s.v1.cni.cncf.io/networks: ovs  spec:  containers:  - name: ovs-offload-container  command: ["/bin/bash", "-c"]  args:  - |  while true; do sleep 1000; done  image: mellanox/rping-test  securityContext:  capabilities:  add: ["IPC_LOCK"]  resources:  requests:  nvidia.com/switchdev: 1  limits:  nvidia.com/switchdev: 1

Troubleshooting OVS

Please see the following DOCA documentation for OVS hardware offload verification and troubleshooting steps:

Network Operator Deployment and OpenvSwitch offload - externally managed OpenvSwitch with VF lag

Warning

This feature is not compatible with the DOCA Driver container.

Warning

This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

Warning

Tech Preview feature.

In this mode, the sriov-network-operator is responsible for configuring the physical and virtual functions but will not manage the configuration of the software bridge. The VF LAG and Open vSwitch should be preconfigured on the host.

Network Operator Configuration

Deploy network-operator by Helm with sriov-network-operator and nv-ipam.

First install the Network Operator with NFD enabled: values.yaml

Copy
Copied!
  

 
 sriovNetworkOperator:  enabled: true

Once the Network Operator has been installed create a NicClusterPolicy with nv-ipam:

Copy
Copied!
  

 
  apiVersion: mellanox.com/v1alpha1  kind: NicClusterPolicy  metadata:  name: nic-cluster-policy  spec:  nvIpam:  image: nvidia-k8s-ipam  repository: ghcr.io/mellanox  version: v0.3.7  imagePullSecrets: []  enableWebhook: false  secondaryNetwork:  cniPlugins:  image: plugins  repository: ghcr.io/k8snetworkplumbingwg  version: v1.6.2-update.1  imagePullSecrets: []  multus:  image: multus-cni  repository: ghcr.io/k8snetworkplumbingwg  version: v4.1.0  imagePullSecrets: []

Switch sriov-network-operator to systemd configuration mode.

Copy
Copied!
  

 
 kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "configurationMode": "systemd"} }' --type='merge'

Create IPPool object for nv-ipam

Copy
Copied!
  

 
 apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata:  name: pool1  namespace: nvidia-network-operator spec:  subnet: 192.168.0.0/16  perNodeBlockSize: 100  gateway: 192.168.0.1  nodeSelector:  nodeSelectorTerms:  - matchExpressions:  - key: node-role.kubernetes.io/worker  operator: Exists

Prerequisites for Worker Nodes

Supported operating systems:

Ubuntu 22.04

OpenvSwitch from the NVIDIA DOCA for Host package with doca-all or doca-networking profile should be installed on each worker node.

Check NVIDIA DOCA Official installation guide for details.

Supported OpenvSwitch dataplanes:

OVS-kernel
OVS-doca

Check OpenvSwitch Offload document to know about differences.

Configure Bond interface with netplan

Copy
Copied!
  

 
 # content of /etc/netplan/01-uplink-bond.yaml network: version: 2 renderer: networkd ethernets: enp4s0f0np0: dhcp4: no dhcp6: no enp4s0f1np1: dhcp4: no dhcp6: no bonds: bond0: dhcp4: no dhcp6: no interfaces: - enp4s0f0np0 - enp4s0f1np1 parameters: mode: 802.3ad

Replace `enp4s0f0np0` and `enp4s0f1np1` with the right PF names for you node

OVS-kernel

These steps are for OVS-kernel data plane, to use OVS-doca follow instructions from the relevant section.

Prepare Worker Nodes

Configure Open_vSwitch

Copy
Copied!
  

 
 ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy
Copied!
  

 
 systemctl restart openvswitch-switch.service

Create bridge

Create OVS bridge

Copy
Copied!
  

 
 ovs-vsctl add-br mybr # this commad may fail with "No such device" error ovs-vsctl add-port mybr bond0

Note: the second command may fail with “No such device” error because bond0 interface is not exist yet.

Sriov Network Operator Configuration

Create SriovNetworkNodePolicy for selected NIC

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: ovs-switchdev  namespace: nvidia-network-operator spec:  eSwitchMode: switchdev  mtu: 1500  nicSelector:  deviceID: 101d  vendor: 15b3  nodeSelector:  node-role.kubernetes.io/worker: ""  numVfs: 4  isRdma: true  linkType: ETH  resourceName: switchdev

Create OVSNetwork CR

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata:  name: ovs  namespace: nvidia-network-operator spec:  networkNamespace: default  ipam: |  {  "type": "nv-ipam",  "poolName": "pool1"  }  resourceName: switchdev

OVS-doca

These steps are for OVS-doca data plane, to use OVS-kernel follow instructions from the relevant section.

Prepare Worker Nodes

Configure hugepages

Copy
Copied!
  

 
 mkdir -p /hugepages mount -t hugetlbfs hugetlbfs /hugepages echo 4096 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Note: for multi CPU system hugepages should be created for each NUMA node: node0, node1, …

Configure system to create hugepages on boot

Copy
Copied!
  

 
 echo "vm.nr_hugepages=8192" > /etc/sysctl.d/99-hugepages.conf

Note: this example is for a server with two CPU

Configure Open_vSwitch

Copy
Copied!
  

 
 ovs-vsctl --no-wait set Open_vSwitch . other_config:doca-init=true ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy
Copied!
  

 
 systemctl restart openvswitch-switch.service

Create OVS bridge

Copy
Copied!
  

 
 ovs-vsctl --no-wait add-br mybr -- set bridge mybr datapath_type=netdev # this commad may fail with "No such device" error ovs-vsctl add-port mybr bond0 -- set Interface bond0 type=dpdk options:dpdk-lsc-interrupt=true mtu_request=1450

Note: the second command may fail with “No such device” error because bond0 interface is not exist yet.

Sriov Network Operator Configuration

Create SriovNetworkNodePolicy for selected NIC

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: ovs-switchdev  namespace: nvidia-network-operator spec:  eSwitchMode: switchdev  mtu: 1500  nicSelector:  deviceID: 101d  vendor: 15b3  nodeSelector:  node-role.kubernetes.io/worker: ""  numVfs: 4  isRdma: true  linkType: ETH  resourceName: switchdev

Create OVSNetwork CR

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata:  name: ovs  namespace: nvidia-network-operator spec:  networkNamespace: default  ipam: |  {  "type": "nv-ipam",  "poolName": "pool1"  }  resourceName: switchdev  interfaceType: dpdk

Test Workload

Copy
Copied!
  

 
 apiVersion: apps/v1 kind: Deployment metadata:  name: ovs-offload  labels:  app: ovs-offload spec:  replicas: 2  selector:  matchLabels:  app: ovs-offload  template:  metadata:  labels:  app: ovs-offload  annotations:  k8s.v1.cni.cncf.io/networks: ovs  spec:  containers:  - name: ovs-offload-container  command: ["/bin/bash", "-c"]  args:  - |  while true; do sleep 1000; done  image: mellanox/rping-test  securityContext:  capabilities:  add: ["IPC_LOCK"]  resources:  requests:  nvidia.com/switchdev: 1  limits:  nvidia.com/switchdev: 1

Troubleshooting OVS

Please see the following DOCA documentation for OVS hardware offload verification and troubleshooting steps:

Network Operator Deployment and RDMA exclusive subsystem mode

When RDMA subsystem is in shared mode, RDMA device is accessible in all network namespace. When RDMA device isolation among multiple network namespaces is not needed, shared mode can be used. This mode is enabled by default.

Note

To use RDMA shared mode with MacVlanNetwork please check Network Operator Deployment with RDMA Shared Device Plugin section.

When user wants to assign dedicated RDMA device to a particular network namespace, exclusive mode should be configured.

SR-IOV Network Operator Configuration

First install the Network Operator with NFD and SR-IOV Operator enabled:

values.yaml:

Copy
Copied!
  

 
 nfd:  enabled: true sriovNetworkOperator:  enabled: true

To configure RDMA exclusive mode apply SriovNetworkPoolConfig CR and specify rdmaMode:

sriov-network-pool-config-number.yaml

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata:  name: rdma-exclusive-pool  namespace: nvidia-network-operator spec:  nodeSelector:  - matchExpressions:  - key: feature.node.kubernetes.io/pci-15b3.present  operator: "Exists"  rdmaMode: exclusive

The sriovnetwork-node-policy.yaml configuration should be applied to configure SR-IOV and deploy RDMA CNI:

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata:  name: policy-1  namespace: nvidia-network-operator spec:  deviceType: netdevice  mtu: 1500  nicSelector:  vendor: "15b3"  pfNames: ["ens2f0"]  nodeSelector:  feature.node.kubernetes.io/pci-15b3.present: "true"  numVfs: 8  priority: 90  isRdma: true  resourceName: sriov_resource

RDMA CNI plugin is intended to be run as a chained CNI plugin:

Copy
Copied!
  

 
 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata:  name: "example-sriov-network"  namespace: nvidia-network-operator spec:  vlan: 0  networkNamespace: "default"  resourceName: "sriov_resource"  ipam: |-  {  "type": "nv-ipam",  "poolName": "pool1"  }  metaPlugins: |  {  "type": "rdma"  }

Test Workload

The pod.yaml configuration file for such a deployment:

Copy
Copied!
  

 
 apiVersion: v1 kind: Pod metadata:  name: testpod1  annotations:  k8s.v1.cni.cncf.io/networks: example-sriov-network spec:  containers:  - name: appcntr1  image: <image>  imagePullPolicy: IfNotPresent  securityContext:  capabilities:  add: ["IPC_LOCK"]  resources:  requests:  nvidia.com/sriov_resource: '1'  limits:  nvidia.com/sriov_resource: '1'  command:  - sh  - -c  - sleep inf

Configure NIC Firmware using the NIC Configuration Operator

NVIDIA NIC Configuration Operator provides Kubernetes API (Custom Resource Definition) to allow Firmware update and configuration on NVIDIA NICs in a coordinated manner. It deploys a configuration daemon on each of the desired nodes to configure NVIDIA NICs there. NVIDIA NIC Configuration Operator uses Maintenance Operator to prepare a node for maintenance before the actual configuration.

Warning

NVIDIA NIC Configuration Operator does not support FW reset flow for DPU mode. Check limitations

Note

To perform Firmware validation and update on NIC devices, NIC Configuration Operator requires a persistent storage set up in the cluster. To set up a persistent NFS storage in the cluster, the example from the CSI NFS Driver repository might be used. After deploying the NFS server and NFS CSI driver, the storage class should become available in the cluster. The name of the storage class should then be passed when configuring the NIC Configuration Operator.

First install the Network Operator helm chart with the Maintenance Operator enabled and deploy a NIC Cluster Policy CRD with NIC Configuration Operator enabled:

values.yaml:

Copy
Copied!
  

 
 maintenanceOperator:  enabled: true

nicclusterpolicy.yaml:

Copy
Copied!
  

 
 apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata:  name: nic-cluster-policy spec:  nicConfigurationOperator:  operator:  image: nic-configuration-operator  repository: ghcr.io/mellanox  version: v1.0.3  configurationDaemon:  image: nic-configuration-operator-daemon  repository: ghcr.io/mellanox  version: v1.0.3  nicFirmwareStorage:  create: true  pvcName: nic-fw-storage-pvc  # Name of the storage class is provided by the user  storageClassName: nfs-csi  availableStorageSize: 1Gi

Observe the NicDevice CRs detected in the cluster. The name of the CR is composed from the node name, NIC type and its serial number:

Copy
Copied!
  

 
 > kubectl get nicdevices -n nvidia-network-operator NAME AGE node1-1015-mt1627x08307 1m node1-101d-mt1952x03330 1m node2-1015-mt1627x08305 1m node2-101d-mt1952x03327 1m

Discover more information about a specific device:

Copy
Copied!
  

 
 kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o yaml

Copy
Copied!
  

 
 apiVersion: configuration.net.nvidia.com/v1alpha1 kind: NicDevice metadata:  creationTimestamp: "2024-09-21T08:43:08Z"  generation: 1  name: node1-101d-mt1952x03327  namespace: nvidia-network-operator  ownerReferences:  - apiVersion: v1  kind: Node  name: node1  uid: 25c4f4e2-f7ba-4ba9-9a87-8056313ffc79  resourceVersion: "1177095"  uid: ac6763bf-67c6-4af5-81f8-1aad5da929bf spec: {} status:  conditions:  - type: FirmwareUpdateInProgress  status: "False"  reason: DeviceFirmwareSpecEmpty  message: Device firmware spec is empty, cannot update or validate firmware  lastTransitionTime: "2024-09-21T08:43:04Z"  - type: ConfigUpdateInProgress  status: "False"  reason: DeviceConfigSpecEmpty  message: Device configuration spec is empty, cannot update configuration  lastTransitionTime: "2024-09-21T08:43:08Z" firmwareVersion: 22.39.1015 node: cloud-dev-41 partNumber: mcx623106ac-cdat ports: - networkInterface: enp3s0f0np0  pci: "0000:03:00.0"  rdmaInterface: mlx5_0 - networkInterface: enp3s0f1np1  pci: "0000:03:00.1"  rdmaInterface: mlx5_1 psid: mt_0000000436 serialNumber: mt1952x03327 type: 101d

Configure and apply the NICFirmwareSource CR:

Copy
Copied!
  

 
 apiVersion: configuration.net.nvidia.com/v1alpha1 kind: NicFirmwareSource metadata:  name: connectx6-dx-firmware-22-44-1036  namespace: nvidia-network-operator  finalizers:  - configuration.net.nvidia.com/nic-configuration-operator spec:  # a list of firmware binaries zip archives from the Mellanox website, can point to any url accessible from the cluster  binUrlSources:  - https://www.mellanox.com/downloads/firmware/fw-ConnectX6Dx-rel-22_44_1036-MCX623106AC-CDA_Ax-UEFI-14.37.14-FlexBoot-3.7.500.signed.bin.zip

Observe the NICFirmwareSource status:

Copy
Copied!
  

 
 > kubectl get nicfirmwaresource -n nvidia-network-operator connectx6-dx-firmware-22-44-1036 -o yaml ... status: state: Success versions: 22.44.1036: - mt_0000000436

Configure and apply the NicFirmwareTemplate CR:

Copy
Copied!
  

 
 apiVersion: configuration.net.nvidia.com/v1alpha1 kind: NicFirmwareTemplate metadata:  name: connectx6dx-config  namespace: nvidia-network-operator spec:  nodeSelector:  kubernetes.io/hostname: node1  nicSelector:  nicType: "101d"  template:  nicFirmwareSourceRef: connectx6dx-firmware-22-44-1036  updatePolicy: Update

Configure and apply the NicConfigurationTemplate CR:

Copy
Copied!
  

 
 apiVersion: configuration.net.nvidia.com/v1alpha1 kind: NicConfigurationTemplate metadata:  name: connectx6-config  namespace: nvidia-network-operator spec:  nodeSelector:  feature.node.kubernetes.io/network-sriov.capable: "true"  nicSelector:  # nicType selector is mandatory the rest are optional. Only a single type can be specified.  nicType: 101d  pciAddresses:  - "0000:03:00.0"  - “0000:04:00.0”  serialNumbers:  - "mt1952x03327"  resetToDefault: false # if set, template is ignored, device configuration should reset  template:  # numVfs and linkType fields are mandatory, the rest are optional  numVfs: 2  linkType: Ethernet  pciPerformanceOptimized:  enabled: true  maxReadRequest: 4096  roceOptimized:  enabled: true  qos:  trust: dscp  pfc: "0,0,0,1,0,0,0,0"  gpuDirectOptimized:  enabled: true  env: Baremetal

Note

It’s not possible to apply more than one template of each kind (NICFirmwareTemplate or NICConfigurationTemplate) to a single device. In this case, no template will be applied and an error event will be emitted for the corresponding NicDevice CR.

Note

To use the NIC Configuration Operator functionality together with SR-IOV Network Operator, “mellanox” plugin should be disabled in the SR-IOV Network Operator.

For more information about the CRD API, refer to API documentation. For more information, which FW parameter each settings corresponds to, refer to Configuration details doc section.

Spec of the NicDevice CR is updated in accordance with the NICFirmwareTemplate and NicConfigurationTemplate CRs matching the device

Copy
Copied!
  

 
 > kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.spec}' | yq -P template: firmware: nicFirmwareSourceRef: connectx6dx-firmware-22-44-1036 updatePolicy: Update configuration: numVfs: 2 linkType: Ethernet pciPerformanceOptimized: enabled: true roceOptimized: enabled: true qos: trust: dscp pfc: "0,0,0,1,0,0,0,0" gpuDirectOptimized: enabled: true env: Baremetal

Status conditions of the NicDevice CR reflect the status of the configuration update and indicate any errors that might occur during the process

Copy
Copied!
  

 
 > kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P - type: FirmwareUpdateInProgress status: "False" reason: DeviceFirmwareConfigMatch message: Firmware matches the requested version observedGeneration: 4 lastTransitionTime: "2024-09-21T08:42:23Z" - type: ConfigUpdateInProgress status: "True" reason: UpdateStarted message: "" lastTransitionTime: "2024-09-21T08:43:08Z"

NIC Firmware Mismatch Notification

NIC Configuration Operator updates status conditions of the NicDevice CR to set FirmwareConfigMatch condition based on a current NIC firmware:

Copy
Copied!
  

 
 > kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P - type: FirmwareConfigMatch status: "True" reason: DeviceFirmwareConfigMatch message: Device firmware '20.42.1000' matches to recommended version '20.42.1000' lastTransitionTime: "2024-09-21T08:43:10Z"

FirmwareConfigMatch condition status is set to Unknown if DOCA Driver is not installed otherwise it notifies if current NIC firmware is recommended or not recommended by DOCA Driver. E.g.:

Copy
Copied!
  

 
  > kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P - type: FirmwareConfigMatch status: "True" reason: DeviceFirmwareConfigMatch message: Device firmware '20.42.1000' matches to recommended version '20.42.1000' lastTransitionTime: "2024-11-08T09:19:41Z"

Previous Platform Support

Next Getting Started with Red Hat OpenShift

On This Page