Transformers documentation
SpQR
You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.57.1).
SpQR
The SpQR quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers.

To quantize a model with SpQR, refer to the Vahe1994/SpQR repository.
Load a SpQR-quantized model with from_pretrained().
from transformers import AutoTokenizer, AutoModelForCausalLM import torch quantized_model = AutoModelForCausalLM.from_pretrained( "elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf", dtype=torch.half, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf")