Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Commit 7f0090e

Browse files
[NeuralChat] Support Gaudi model parallelism serving (#802)
1 parent fd74a9a commit 7f0090e

File tree

18 files changed

+534
-15
lines changed

18 files changed

+534
-15
lines changed

intel_extension_for_transformers/neural_chat/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,7 @@ class LoadingModelConfig:
402402
use_hpu_graphs: bool = False
403403
use_cache: bool = True
404404
use_deepspeed: bool = False
405+
world_size: int = 1
405406
ipex_int8: bool = False
406407
use_llm_runtime: bool = False
407408

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
This README is intended to guide you through setting up the backend for a text chatbot using the NeuralChat framework. You can deploy this text chatbot on various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU and Client GPU.
2+
3+
This textbot shows how to deploy chatbot backend on Habana's Gaudi processors (HPU).
4+
5+
# Setup Conda
6+
7+
First, you need to install and configure the Conda environment:
8+
9+
```shell
10+
# Download and install Miniconda
11+
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
12+
bash Miniconda*.sh
13+
source ~/.bashrc
14+
```
15+
16+
# Install Python dependencies
17+
18+
Install dependencies using pip
19+
20+
>**Note**: Please make sure transformers version is 4.34.1
21+
```bash
22+
pip install ../../../../../requirements_hpu.txt
23+
pip install transformers==4.34.1
24+
```
25+
26+
# Configure the textbot.yaml
27+
28+
You can customize the configuration file 'textbot.yaml' to match your environment setup. Here's a table to help you understand the configurable options:
29+
30+
| Item | Value |
31+
| ------------------- | --------------------------------------- |
32+
| host | 127.0.0.1 |
33+
| port | 8000 |
34+
| model_name_or_path | "meta-llama/Llama-2-7b-chat-hf" |
35+
| device | "hpu" |
36+
| use_deepspeed | true |
37+
| world_size | 8 |
38+
| tasks_list | ['textchat'] |
39+
40+
41+
42+
# Run the TextChat server
43+
To start the TextChat server, use the following command:
44+
45+
```shell
46+
nohup python run_text_chat.py &
47+
```
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# This is the parameter configuration file for NeuralChat Serving.
19+
20+
#################################################################################
21+
# SERVER SETTING #
22+
#################################################################################
23+
host: 0.0.0.0
24+
port: 8000
25+
26+
model_name_or_path: "Phind/Phind-CodeLlama-34B-v2"
27+
device: "hpu"
28+
use_deepspeed: true
29+
world_size: 8
30+
31+
# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune']
32+
tasks_list: ['textchat']

intel_extension_for_transformers/neural_chat/examples/deployment/textbot/backend/README.md renamed to intel_extension_for_transformers/neural_chat/examples/deployment/textbot/backend/xeon/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
This README is intended to guide you through setting up the backend for a text chatbot using the NeuralChat framework. You can deploy this text chatbot on various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU and Client GPU.
22

3+
This textbot shows how to deploy chatbot backend on Intel XEON Scalable Processors.
34

45
# Setup Conda
56

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# !/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
19+
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
20+
21+
def main():
22+
server_executor = NeuralChatServerExecutor()
23+
server_executor(config_file="./textbot.yaml", log_file="./textbot.log")
24+
25+
if __name__ == "__main__":
26+
main()

0 commit comments

Comments
 (0)