Skip to content

Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

License

Notifications You must be signed in to change notification settings

eltociear/qa-lora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QA-LoRA

This repository provides the PyTorch implementation of QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

QA-LoRA is easily implemented with a few lines of code, and it equips the original LoRA with two-fold abilities: (i) during fine-tuning, the LLM's weights are quantized (e.g., into INT4) to reduce time and memory usage; (ii) after fine-tuning, the LLM and auxiliary weights are naturally integrated into a quantized model without loss of accuracy.

Installation

conda create -n qalora python=3.8 conda activate qalora pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 git clone -b peft_integration https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ pip install .[triton] cd .. git clone https://github.com/timdettmers/bitsandbytes.git cd bitsandbytes # CUDA_VERSIONS in {110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 120} # make argument in {cuda110, cuda11x, cuda12x} # if you do not know what CUDA you have, try looking at the output of: python -m bitsandbytes CUDA_VERSION=117 make cuda11x python setup.py install cd .. pip install git+https://github.com/huggingface/transformers.git pip install git+https://github.com/huggingface/peft.git pip install git+https://github.com/huggingface/accelerate.git pip install -r requirements.txt pip install protobuf==3.20.*

Change the peft_utils.py in your own auto-gptq path(python path/auto_gptq/utils/peft_utils.py) with the new one. For the users of GPTQLORA, you only need to change the peft_utils.py file.

Quantization

We use GPTQ for quantization. bits=4, group-size=32, act-order=False If you change the group-size, you need to change the group_size in peft_utils.py and merge.py accordingly.

Training

python qalora.py --model_path <path>

The file structure of the model checkpoint is as follows:

config.json llama7b-4bit-32g.bin special_tokens_map.json tokenizer_config.json generation_config.json quantize_config.json tokenizer.model 

Merge

Note that our trained LoRA modules can be perfectly merged into the quantized model. We offer a simple merged script in this repo.

Acknoledgements

Our code is based on QLoRA, GPTQLORA, Auto-GPTQ

About

Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages