Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Conversation

@Spycsh
Copy link
Contributor

@Spycsh Spycsh commented Feb 21, 2024

Type of Change

Task

Description

support serving and deploying NeuralChat models with Triton Inference Server on CUDA (single or multi-card) devices.

Wrapped new Docker image: spycsh/triton_neuralchat_gpu:v2

The tag v2 is the version also enables multi-card instance group initialization.

Expected Behavior & Potential Risk

serving and deploying NeuralChat models with Triton Inference Server on CUDA

How has this PR been tested?

example

Dependency Change?

None. It requires numba but already wrapped in the docker image. No change to itrex itself.

@VincyZhang
Copy link
Contributor

@Spycsh @lvliang-intel ready for merge?

@VincyZhang VincyZhang merged commit 4657036 into main Feb 23, 2024
@VincyZhang VincyZhang deleted the spycsh/triton_cuda branch February 23, 2024 05:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

4 participants