Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Commit 936c2d2

Browse files
authored
add ppo rl_training part (#634)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
1 parent 4101a80 commit 936c2d2

File tree

12 files changed

+4244
-4
lines changed

12 files changed

+4244
-4
lines changed

intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,25 @@ multi card finetunes
4343
```
4444
python ../instruction/gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir <output> --log_level info --num_train_epochs 1 --use_habana --use_lazy_mode --hf_access_token xxxxxx --ddp_find_unused_parameters True
4545
```
46+
47+
## 5. Reinforcement Fine-tuning
48+
49+
### Training on CUDA
50+
```
51+
accelerate launch --multi_gpu --num_machines 1 --num_processes 8 rl_training.py --log_with=wandb --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=output_se --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx
52+
```
53+
54+
### Training on Habana
55+
56+
Follow install guidance in [optimum-habana](https://github.com/huggingface/optimum-habana)
57+
58+
single card finetune
59+
60+
```
61+
python3 rl_training.py --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=<output_rm> --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --mini_batch_size=1 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx --use_habana
62+
```
63+
64+
multi card finetunes
65+
```
66+
python3 ../instruction/gaudi_spawn.py --world_size 8 --use_mpi rl_training.py --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=<output_rm> --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --mini_batch_size=1 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx --use_habana
67+
```

intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,5 @@ datasets
55
bitsandbytes
66
evaluate
77
scikit-learn
8+
intel-extension-for-transformers
9+
tyro

intel_extension_for_transformers/neural_chat/examples/finetuning/ppo_pipeline/reward_modeling.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -199,14 +199,14 @@ def preprocess_function(examples):
199199
"input_ids_k": [],
200200
"attention_mask_k": [],
201201
}
202-
for question, response_j, response_k in zip(
203-
examples["question"], examples["chatgpt"], examples["llama2-13b-chat"]
202+
for system, question, response_j, response_k in zip(
203+
examples["system"], examples["question"], examples["chatgpt"], examples["llama2-13b-chat"]
204204
):
205205
tokenized_j = tokenizer(
206-
"Question: " + question + "\n\nAnswer: " + response_j, truncation=True
206+
system + question + response_j, truncation=True
207207
)
208208
tokenized_k = tokenizer(
209-
"Question: " + question + "\n\nAnswer: " + response_k, truncation=True
209+
system + question + response_k, truncation=True
210210
)
211211

212212
new_examples["input_ids_j"].append(tokenized_j["input_ids"])

0 commit comments

Comments
 (0)