KServe Support on NIM Operator#

About KServe Support#

The NIM Operator supports both raw deployment and serverless deployment of NIM through KServe on Kubernetes clusters, including Red Hat OpenShift Container Platform.

NIM Operator with KServe provides two additional benefits:

Intelligent Caching with NIMCache to reduce initial inference time and autoscaling latency, resulting in faster and more responsive deployments.
NeMo Microservices support to evaluate, guardrail, and enhance AI systems across key metrics such as latency, accuracy, cost, and compliance.

The Operator configures KServe deployments using InferenceService to manage deployment, upgrade, ingress, and autoscaling of NIM.

Diagram 1. NIM Operator and KServe interaction

KServe Deployment Modes Comparison#

Category	KServe serverless (with Knative)	Raw Deployment
Autoscaling	Scales pods automatically based on request load (speed and traffic). Can scale all the way to zero when unused. These are configured by providing Knative annotations in the NIMService `spec.annotations` field.	Uses Horizontal/Vertical Pod Autoscalers with custom NIM metrics. Cannot scale to zero by default. These are configured by providing `spec.scaling.hpa` config in the NIM service.
Upgrades	Every new model version creates a new revision. Can send only a portion of traffic to test new versions. Easy rollback by shifting traffic. This is automatically managed by KServe. No special parameters required in the NIMService API. Refer to the Knative documentation on gradual rollouts and traffic management for configuring canary rollouts.	Uses rolling updates. Rollback is manual if something goes wrong. This is not configurable. Only RollingUpdate of native Kubernetes deployment is supported.
Ingress	Uses Knative Gateway (such as Istio). Gives each model revision a stable domain/URL. Built-in secure connections. Has a queue system to handle overload. This is automatically managed by KServe. No special parameters are added. The domain name has to be configured during KServe install (default example.com).	Exposed using Kubernetes Service + Ingress/LoadBalancer (NGINX, Istio, etc.) Security (mTLS, certificates) must be set up manually. This is automatically managed by KServe. No special parameters are added. The domain name has to be configured during KServe install (default example.com).
NIM Metrics	Metrics and tracing are built in through Knative, KServe, and service mesh. No extra config in the NIMService is required.	Must build your own stack. For example, Prometheus, Grafana, OpenTelemetry, etc. ServiceMonitor needs to be enabled through the NIMService `spec.metrics.enabled` field set to `true` and `spec.metrics.serviceMonitor` config.
GPU Resources	Passed using the `spec.resources` field.	Passed using the `spec.resources` field.
Multi-Node	Not supported.	Not supported.
Dynamic Resource Allocation (DRA)	Not supported.	Not supported.
Tolerations	Passed through `spec.tolerations` in NIMService. Feature gates must be enabled with the Knative config.	Passed through `spec.tolerations` in NIMService.
RuntimeClassName	Passed through `spec.runtimeClassName` in NIMService. Feature gates must be enabled with the Knative config.	Passed through `spec.runtimeClassName` in NIMService.
NodeSelectors	Passed through `spec.nodeSelectors` in NIMService. Feature gates must be enabled with the Knative config.	Passed through `spec.nodeSelectors` in NIMService.
Custom Scheduler Name	Passed through `spec.schedulerName` in NIMService. Feature gates must be enabled with the Knative config.	Passed through `spec.schedulerName` in NIMService.

Select Your Deployment Environment and Type#

Raw Deployment
on Standard Kubernetes

#raw-deployment-example-on-standard-kubernetes

Raw Deployment
on Red Hat OpenShift

#raw-deployment-example-on-red-hat-openshift

Serverless Deployment
on Standard Kubernetes

#serverless-deployment-example-on-standard-kubernetes

Note

This documentation uses kubectl for Kubernetes examples and oc for OpenShift examples. Both tools provide similar functionality for their respective platforms.

Raw Deployment Example on Standard Kubernetes#

Summary#

For raw deployment of NIM through KServe on a standard Kubernetes installation, follow these steps:

1. Install KServe in Raw Deployment Mode#

Note

For security, consider downloading and reviewing the script before execution in production environments.

Run the following command to execute the KServe quick install script:

$ curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.15/hack/quick_install.sh" | bash -s - -r 

For more information, refer to Getting Started with KServe.

The following components are deployed by the KServe quick install script:

Component	Description
KServe	Installs KServe CRDs, such as `InferenceService`. Installs KServe controller in the `kserve` namespace. Provides model serving, autoscaling, and inference management on Kubernetes.
Gateway API CRDs	Installs the Gateway API CustomResourceDefinitions: `GatewayClass`, `Gateway`, `GRPCRoute`, `HTTPRoute`, `ReferenceGrant`. Provides modern networking primitives for routing traffic into services.
Istio (Service Mesh)	Deployed into the `istio-system` namespace with Helm charts: `istio-base`: core Istio CRDs and cluster-scoped resources. `istiod`: Istio control plane (pilot, configuration, service discovery). `istio-ingressgateway`: data plane ingress gateway for external traffic.
Cert-Manager	Installed in the `cert-manager` namespace. Handles TLS certificate provisioning and management. Required for automatic HTTPS and secure communication.

Verify the installation by checking each component:

KServe

$ kubectl get pods -n kserve 

Istio System

$ kubectl get pods -n istio-system 

Cert Manager

$ kubectl get pods -n cert-manager 

Note

To uninstall KServe, follow the instructions in Uninstalling KServe.

2. Optional: Create NIM Cache#

Note

Refer to prerequisites for more information on using NIM Cache.

For an LLM-Specific NIM

Create a file, such as nimcache.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata:  name: meta-llama-3-2-1b-instruct  namespace: nim-service spec:  source:  ngc:  modelPuller: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.12.0  pullSecret: ngc-secret  authSecret: ngc-api-secret  model:  engine: "tensorrt"  tensorParallelism: "1"  storage:  pvc:  create: true  storageClass:  size: "50Gi"  volumeAccessMode: ReadWriteOnce 

Apply the manifest:

$ kubectl apply -n nim-service -f nimcache.yaml 

For a Multi-LLM NIM

Create a file, such as nimcache.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata:  name: nim-cache-multi-llm  namespace: nim-service spec:  source:  hf:  endpoint: "https://huggingface.co"  namespace: "nvidia"  authSecret: hf-api-secret  modelPuller: nvcr.io/nim/nvidia/llm-nim:1.12  pullSecret: ngc-secret  modelName: "Llama-3.1-Nemotron-Nano-8B-v1"  storage:  pvc:  create: true  storageClass: ''  size: "50Gi"  volumeAccessMode: ReadWriteOnce 

Apply the manifest:

$ kubectl apply -n nim-service -f nimcache.yaml 

3. Deploy NIM through KServe as a Raw Deployment#

For an LLM-Specific NIM

Create a file, such as nimservice.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata:  name: meta-llama-3-2-1b-instruct  namespace: nim-service spec:  inferencePlatform: kserve  annotations:  serving.kserve.io/deploymentMode: 'RawDeployment'  scale:  enabled: true  hpa:  minReplicas: 1  maxReplicas: 3  metrics:  - type: "Resource"  resource:  name: "cpu"  target:  type: "Utilization"  averageUtilization: 80  image:  repository: nvcr.io/nim/meta/llama-3.2-1b-instruct  tag: "1.12.0"  pullPolicy: IfNotPresent  pullSecrets:  - ngc-secret  authSecret: ngc-api-secret  storage:  nimCache:  name: meta-llama-3-2-1b-instruct  replicas: 1  resources:  limits:  nvidia.com/gpu: 1  cpu: "12"  memory: 32Gi  requests:  nvidia.com/gpu: 1  cpu: "4"  memory: 6Gi  expose:  service:  type: ClusterIP  port: 8000 

Apply the manifest for raw deployment:

$ kubectl create -f nimservice.yaml -n nim-service 

Verify that the inference service has been created:

View inference service details:

$ kubectl get inferenceservice -n nim-service -o yaml 

Example output

 apiVersion: v1  items:  - apiVersion: serving.kserve.io/v1beta1  kind: InferenceService  metadata:  annotations:  nvidia.com/last-applied-hash: 757475558b  nvidia.com/parent-spec-hash: b978f49f7  openshift.io/required-scc: nonroot  serving.kserve.io/autoscalerClass: hpa  serving.kserve.io/deploymentMode: RawDeployment  serving.kserve.io/enable-metric-aggregation: "true"  serving.kserve.io/enable-prometheus-scraping: "true"  temp: temp-1  creationTimestamp: "2025-07-25T14:58:06Z"  finalizers:  - inferenceservice.finalizers  generation: 1  labels:  app.kubernetes.io/instance: meta-llama-3-2-1b-instruct  app.kubernetes.io/managed-by: k8s-nim-operator  app.kubernetes.io/name: meta-llama-3-2-1b-instruct  app.kubernetes.io/operator-version: ""  app.kubernetes.io/part-of: nim-service  networking.kserve.io/visibility: cluster-local  temp2: temp-2  name: meta-llama-3-2-1b-instruct  namespace: nim-service  ownerReferences:  - apiVersion: apps.nvidia.com/v1alpha1  blockOwnerDeletion: true  controller: true  kind: NIMService  name: meta-llama-3-2-1b-instruct  uid: aaa7e95a-81e4-404a-a1de-6dce898f9937  resourceVersion: "43983337"  uid: c92a7aa9-4954-4959-86b3-ad8aa6c39ca8  spec:  predictor:  annotations:  openshift.io/required-scc: nonroot  serving.kserve.io/deploymentMode: RawDeployment  temp: temp-1  containers:  - env:  - name: MY_ENV  value: my-value  - name: NIM_CACHE_PATH  value: /model-store  - name: NGC_API_KEY  valueFrom:  secretKeyRef:  key: NGC_API_KEY  name: ngc-api-secret  - name: OUTLINES_CACHE_DIR  value: /tmp/outlines  - name: NIM_SERVER_PORT  value: "8000"  - name: NIM_HTTP_API_PORT  value: "8000"  - name: NIM_JSONL_LOGGING  value: "1"  - name: NIM_LOG_LEVEL  value: INFO  image: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.8  imagePullPolicy: IfNotPresent  livenessProbe:  failureThreshold: 3  httpGet:  path: /v1/health/live  port: api  initialDelaySeconds: 15  periodSeconds: 10  successThreshold: 1  timeoutSeconds: 1  name: kserve-container  ports:  - containerPort: 8000  name: api  protocol: TCP  readinessProbe:  failureThreshold: 3  httpGet:  path: /v1/health/ready  port: api  initialDelaySeconds: 15  periodSeconds: 10  successThreshold: 1  timeoutSeconds: 1  resources:  limits:  cpu: "12"  memory: 32Gi  nvidia.com/gpu: "1"  requests:  cpu: "12"  memory: 32Gi  nvidia.com/gpu: "1"  startupProbe:  failureThreshold: 30  httpGet:  path: /v1/health/ready  port: api  initialDelaySeconds: 30  periodSeconds: 10  successThreshold: 1  timeoutSeconds: 1  volumeMounts:  - mountPath: /model-store  name: model-store  - mountPath: /dev/shm  name: dshm  deploymentStrategy:  rollingUpdate:  maxSurge: 0  maxUnavailable: 25%  type: RollingUpdate  imagePullSecrets:  - name: ngc-secret  labels:  app: meta-llama-3-2-1b-instruct  maxReplicas: 3  minReplicas: 1  scaleMetric: cpu  scaleMetricType: Utilization  scaleTarget: 80  securityContext:  fsGroup: 2500  runAsGroup: 2500  runAsUser: 2500  serviceAccountName: meta-llama-3-2-1b-instruct  volumes:  - emptyDir:  medium: Memory  name: dshm  - name: model-store  persistentVolumeClaim:  claimName: meta-llama-3-2-1b-instruct-pvc  status:  address:  url: http://meta-llama-3-2-1b-instruct-predictor.nim-service.svc.cluster.local  components:  predictor:  url: http://meta-llama-3-2-1b-instruct-predictor-nim-service.example.com  conditions:  - lastTransitionTime: "2025-07-25T14:58:06Z"  status: "True"  type: IngressReady  - lastTransitionTime: "2025-07-25T14:58:06Z"  status: "True"  type: PredictorReady  - lastTransitionTime: "2025-07-25T14:58:06Z"  status: "True"  type: Ready  deploymentMode: RawDeployment  modelStatus:  copies:  failedCopies: 0  totalCopies: 1  states:  activeModelState: Loaded  targetModelState: Loaded  transitionStatus: UpToDate  observedGeneration: 1  url: http://meta-llama-3-2-1b-instruct-nim-service.example.com  kind: List  metadata:  resourceVersion: "" 

View inference service status:

$ kubectl get inferenceservice -n nim-service 

View NIM Service status:

$ kubectl get nimservice -n nim-service -o json | jq .items[0].status 

Verify that the HPA has been created:

$ kubectl get hpa -n nim-service meta-llama-3-2-1b-instruct-predictor -o yaml 

For a Multi-LLM NIM

Create a file, such as nimservice.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata:  name: nim-service-multi-llm  namespace: nim-service spec:  inferencePlatform: kserve  annotations:  serving.kserve.io/deploymentMode: 'RawDeployment'  image:  repository: nvcr.io/nim/nvidia/llm-nim  tag: "1.12"  pullPolicy: IfNotPresent  pullSecrets:  - ngc-secret  authSecret: ngc-api-secret  storage:  nimCache:  name: nim-cache-multi-llm  profile: 'tensorrt_llm'  resources:  limits:  nvidia.com/gpu: 1  cpu: "12"  memory: 32Gi  requests:  nvidia.com/gpu: 1  cpu: "4"  memory: 6Gi  expose:  service:  type: ClusterIP  port: 8000 

Apply the manifest for raw deployment:

$ kubectl create -f nimservice.yaml -n nim-service 

Verify that the inference service has been created:

$ kubectl get inferenceservice -n nim-service nim-service-multi-llm -o yaml 

Raw Deployment Example on Red Hat OpenShift#

Summary#

For raw deployment of NIM through KServe using Red Hat OpenShift, follow these steps:

1. Install KServe in Raw Deployment Mode Using OpenShift#

Follow the instructions on the Red Hat website for installing the single-model serving platform to install the OpenShift AI Operator.

Figure 1. OpenShift web console

Figure 2. Interface to install OpenShift AI Operator

Follow the steps for standard deployment (OpenShift’s term for raw deployment mode). Select the following settings:
- Update channel: stable
- Version: 2.22.1
- Operator recommended namespace: redhat-ods-operator
Note

There is also an advanced mode, which is equivalent to serverless; however, the NIM Operator does not support it in this release.

Create an instance of Data Science Cluster (DSC)

Figure 3. Interface to create instance of DSC

Use the following YAML to create the DSC:

kind: DataScienceCluster metadata:  labels:  app.kubernetes.io/created-by: rhods-operator  app.kubernetes.io/instance: default-dsc  app.kubernetes.io/managed-by: kustomize  app.kubernetes.io/name: datasciencecluster  app.kubernetes.io/part-of: rhods-operator  name: default-dsc spec:  components:  codeflare:  managementState: Managed  kserve:  defaultDeploymentMode: RawDeployment  managementState: Managed  nim:  managementState: Managed  rawDeploymentServiceConfig: Headed  serving:  ingressGateway:  certificate:  type: OpenshiftDefaultIngress  managementState: Removed  name: knative-serving  modelregistry:  registriesNamespace: rhoai-model-registries  feastoperator: {}  trustyai: {}  ray: {}  kueue: {}  workbenches:  workbenchNamespace: rhods-notebooks  dashboard: {}  modelmeshserving: {}  llamastackoperator: {}  datasciencepipelines: {}  trainingoperator: {} 

Create an instance of DSCInitialization (DSI):

Figure 4. Interface to create instance of DSI

Use the following YAML to create the DSI:

apiVersion: dscinitialization.opendatahub.io/v1 kind: DSCInitialization metadata:  name: default-dsci spec:  applicationsNamespace: redhat-ods-applications  serviceMesh:  controlPlane:  metricsCollection: Istio  name: data-science-smcp  namespace: istio-system  managementState: Removed 

Verify the KServe controller is running:

Click on Workloads > Pods > redhat-ods-applications project.

2. Create NIM Cache Using OpenShift#

Note

Refer to prerequisites for more information on using NIM Cache.

Create a file, such as nimcache.yaml, with contents like the following example:

# NIM Cache for OpenShift apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata:  name: meta-llama3-1b-instruct  namespace: nim-service spec:  source:  ngc:  authSecret: ngc-api-secret  model:  engine: tensorrt_llm  tensorParallelism: '1'  modelPuller: 'nvcr.io/nim/meta/llama-3.2-1b-instruct:1.8.3'  pullSecret: ngc-secret  storage:  pvc:  create: true  size: 50Gi  volumeAccessMode: ReadWriteOnce 

Apply the manifest:

$ oc apply -n nim-service -f nimcache.yaml 

3. Deploy NIM Through KServe as a Raw Deployment Using OpenShift#

Create a file, such as nimservice.yaml, with contents like the following example:

# NIM Service Raw Deployment Using OpenShift apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata:  name: meta-llama-3-2-1b-instruct  namespace: nim-service spec:  annotations:  serving.kserve.io/deploymentMode: RawDeployment  expose:  service:  port: 8000  type: ClusterIP  scale:  enabled: true  hpa:  maxReplicas: 3  metrics:  - resource:  name: cpu  target:  averageUtilization: 80  type: Utilization  type: Resource  minReplicas: 1  inferencePlatform: kserve  authSecret: ngc-api-secret  image:  pullPolicy: IfNotPresent  pullSecrets:  - ngc-secret  repository: nvcr.io/nim/meta/llama-3.2-1b-instruct  tag: 1.8.3  storage:  nimCache:  name: meta-llama3-1b-instruct  resources:  limits:  nvidia.com/gpu: 1  cpu: "12"  memory: 32Gi  requests:  nvidia.com/gpu: 1  cpu: "4"  memory: 6Gi  replicas: 1 

Apply the manifest for raw deployment:

$ oc create -f nimservice.yaml -n nim-service 

Verify that the inference service has been created:

View inference service details:

$ oc get inferenceservice -n nim-service -o json | jq .items[0].status 

View NIM Service status:

$ oc get nimservice -n nim-service meta-llama-3-2-1b-instruct -o json | jq .status 

Run inference:

$ oc -n nim-service run curltest --rm -i --image=curlimages/curl --restart=Never -- \  curl -s http://meta-llama-3-2-1b-instruct-predictor.nim-service.svc.cluster.local/v1/chat/completions \  -H 'Content-Type: application/json' \  -d '{"model":"meta/llama-3.2-1b-instruct","messages":[{"role":"user","content":"Hello!"}]}' 

Serverless Deployment Example on Standard Kubernetes#

Summary#

For serverless deployment of NIM through KServe on a standard Kubernetes installation, follow these steps:

1. Install KServe in Serverless Mode#

Note

For security, consider downloading and reviewing the script before execution in production environments.

Run the following command to execute the KServe quick install script:

$ curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.15/hack/quick_install.sh" | bash 

For more information, refer to Getting Started with KServe.

The following components are deployed by the KServe quick install script:

Component	Description
KServe	Installs KServe CRDs, such as `InferenceService`. Installs KServe controller in the `kserve` namespace. Provides model serving, autoscaling, and inference management on Kubernetes.
Gateway API CRDs	Installs the Gateway API CustomResourceDefinitions: `GatewayClass`, `Gateway`, `GRPCRoute`, `HTTPRoute`, `ReferenceGrant`. Provides modern networking primitives for routing traffic into services.
Istio (Service Mesh)	Deployed into the `istio-system` namespace with Helm charts: `istio-base`: core Istio CRDs and cluster-scoped resources. `istiod`: Istio control plane (pilot, configuration, service discovery). `istio-ingressgateway`: data plane ingress gateway for external traffic.
Cert-Manager	Installed in the `cert-manager` namespace. Handles TLS certificate provisioning and management. Required for automatic HTTPS and secure communication.
Knative	Installs the Knative Operator in knative-serving namespace. Deploys a KnativeServing custom resource. Provides serverless deployment and scaling primitives used by KServe.

Note

To uninstall KServe, follow the instructions in Uninstalling KServe.

2. Enable PVC Support#

Run the following command to enable persistent volume support:

$ kubectl patch --namespace knative-serving configmap/config-features \  --type merge \  --patch '{"data":{"kubernetes.podspec-persistent-volume-claim": "enabled", "kubernetes.podspec-persistent-volume-write": "enabled"}}' 

These extensions enable persistent volume support:

kubernetes.podspec-persistent-volume-claim: Enables persistent volumes (PVs) with Knative Serving
kubernetes.podspec-persistent-volume-write: Provides write access to those PVs

3. Optional: Create NIM Cache With Serverless Deployment#

Note

Refer to prerequisites for more information on using NIM Cache.

Create a file, such as nimcache.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata:  labels:  app.kubernetes.io/name: k8s-nim-operator  name: meta-llama-3-2-1b-instruct  namespace: nim-service spec:  source:  ngc:  modelPuller: nvcr.io/nim/meta/llama-3.2-1b-instruct:1.12.0  pullSecret: ngc-secret  authSecret: ngc-api-secret  model:  engine: "tensorrt"  tensorParallelism: "1"  storage:  pvc:  create: true  size: "50Gi"  volumeAccessMode: ReadWriteOnce 

Apply the manifest:

$ kubectl apply -n nim-service -f nimcache.yaml 

4. Deploy NIM through KServe as a Serverless Deployment#

Note

Autoscaling and ingress are handled directly by KServe, not through NIMService configuration.

Create a file, such as nimservice.yaml, with contents like the following sample manifest:

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata:  name: meta-llama-3-2-1b-instruct  namespace: nim-service spec:  inferencePlatform: kserve  annotations:  # Knative concurrency-based autoscaling (default).  autoscaling.knative.dev/class: kpa.autoscaling.knative.dev  autoscaling.knative.dev/metric: concurrency  # Target 10 requests in-flight per pod.  autoscaling.knative.dev/target: "10"  # Disable scale to zero with a min scale of 1.  autoscaling.knative.dev/min-scale: "1"  # Limit scaling to 100 pods.  autoscaling.knative.dev/max-scale: "10"  image:  repository: nvcr.io/nim/meta/llama-3.2-1b-instruct  tag: "1.12.0"  pullPolicy: IfNotPresent  pullSecrets:  - ngc-secret  authSecret: ngc-api-secret  storage:  nimCache:  name: meta-llama-3-2-1b-instruct  profile: ''  resources:  limits:  nvidia.com/gpu: 1  cpu: "12"  memory: 32Gi  requests:  nvidia.com/gpu: 1  cpu: "4"  memory: 6Gi  replicas: 1  expose:  service:  type: ClusterIP  port: 8000 

Apply the manifest for serverless deployment:

$ kubectl create -f nimservice.yaml -n nim-service 

Verify that the inference service has been created:

$ kubectl get inferenceservice -n nim-service meta-llama-3-2-1b-instruct-serverless -o yaml 

Uninstalling KServe#

If you installed KServe using the https://raw.githubusercontent.com/kserve/kserve/release-0.15/hack/quick_install.sh quick install script, use the following commands to uninstall it:

helm uninstall --ignore-not-found istio-ingressgateway -n istio-system helm uninstall --ignore-not-found istiod -n istio-system helm uninstall --ignore-not-found istio-base -n istio-system echo "😀 Successfully uninstalled Istio" helm uninstall --ignore-not-found cert-manager -n cert-manager echo "😀 Successfully uninstalled Cert Manager" helm uninstall --ignore-not-found keda -n keda echo "😀 Successfully uninstalled KEDA" kubectl delete --ignore-not-found=true KnativeServing knative-serving -n knative-serving --wait=True --timeout=300s || true helm uninstall --ignore-not-found knative-operator -n knative-serving echo "😀 Successfully uninstalled Knative" helm uninstall --ignore-not-found kserve -n kserve helm uninstall --ignore-not-found kserve-crd -n kserve echo "😀 Successfully uninstalled KServe" kubectl delete --ignore-not-found=true namespace istio-system kubectl delete --ignore-not-found=true namespace cert-manager kubectl delete --ignore-not-found=true namespace kserve