Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4

codewithdark-git · 2025-05-21T15:34:58Z

This commit significantly refactors the model quantization API to enhance your experience and streamline the quantization process.

Key Changes:

New QuantizerFactory API:
- Introduced QuantizerFactory.quantize_from_pretrained in quantllm/api.py. This method serves as the new primary entry point for model quantization.
- It accepts a model name/path, quantization method ('awq', 'gptq', 'gguf'), quantization configuration dictionary, calibration data, and device, returning the quantized model and its tokenizer.
Core Quantizer Updates:
- BaseQuantizer and its subclasses (AWQ, GPTQ, GGUF) now support initialization with either a model name/path string or a pre-loaded PreTrainedModel instance.
- BaseQuantizer handles the loading of the model and its associated tokenizer when a name/path is provided.
- Fixed a NameError for move_to_device in gguf.py, awq.py, and gptq.py by ensuring correct imports.
Quantization Configuration in Model Config:
- Quantizers now automatically save their key parameters (method, bits, group_size, and method-specific details) into model.config.quantization_config. This ensures that this critical information is stored in the model's config.json upon saving, aiding reproducibility and Hugging Face Hub integration.
Documentation Overhaul:
- The primary quantization documentation (docs/api_reference/quantization.rst) has been rewritten to focus on the new QuantizerFactory API, providing clear examples and parameter explanations.
- Details of the underlying quantizer classes are now in an "Advanced" section.
- Docstrings for new and updated components have been added/improved.
New Example Script:
- Added examples/01_quantize_and_push_to_hub.py demonstrating an end-to-end workflow: quantizing a model using the new factory, saving it locally, and pushing it to the Hugging Face Hub.
Unit Test Refactoring:
- Unit tests have been refocused to primarily validate the QuantizerFactory API.
- A new test suite, quantllm/quant/tests/test_api.py, covers various methods, parameters, CPU/GPU execution, and GGUF-specific features.
- Outdated test files for individual quantizers have been removed.

This refactoring aims to provide a more intuitive, robust, and streamlined experience for you when quantizing models with this library.

This commit significantly refactors the model quantization API to enhance your experience and streamline the quantization process. Key Changes: 1. **New `QuantizerFactory` API:** * Introduced `QuantizerFactory.quantize_from_pretrained` in `quantllm/api.py`. This method serves as the new primary entry point for model quantization. * It accepts a model name/path, quantization method ('awq', 'gptq', 'gguf'), quantization configuration dictionary, calibration data, and device, returning the quantized model and its tokenizer. 2. **Core Quantizer Updates:** * `BaseQuantizer` and its subclasses (AWQ, GPTQ, GGUF) now support initialization with either a model name/path string or a pre-loaded `PreTrainedModel` instance. * `BaseQuantizer` handles the loading of the model and its associated tokenizer when a name/path is provided. * Fixed a `NameError` for `move_to_device` in `gguf.py`, `awq.py`, and `gptq.py` by ensuring correct imports. 3. **Quantization Configuration in Model Config:** * Quantizers now automatically save their key parameters (method, bits, group_size, and method-specific details) into `model.config.quantization_config`. This ensures that this critical information is stored in the model's `config.json` upon saving, aiding reproducibility and Hugging Face Hub integration. 4. **Documentation Overhaul:** * The primary quantization documentation (`docs/api_reference/quantization.rst`) has been rewritten to focus on the new `QuantizerFactory` API, providing clear examples and parameter explanations. * Details of the underlying quantizer classes are now in an "Advanced" section. * Docstrings for new and updated components have been added/improved. 5. **New Example Script:** * Added `examples/01_quantize_and_push_to_hub.py` demonstrating an end-to-end workflow: quantizing a model using the new factory, saving it locally, and pushing it to the Hugging Face Hub. 6. **Unit Test Refactoring:** * Unit tests have been refocused to primarily validate the `QuantizerFactory` API. * A new test suite, `quantllm/quant/tests/test_api.py`, covers various methods, parameters, CPU/GPU execution, and GGUF-specific features. * Outdated test files for individual quantizers have been removed. This refactoring aims to provide a more intuitive, robust, and streamlined experience for you when quantizing models with this library.

codewithdark-git merged commit 437cb87 into main May 21, 2025
1 check passed

codewithdark-git deleted the quant-optim-docs-tests branch May 27, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4

Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4

Uh oh!

codewithdark-git commented May 21, 2025

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4

Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4

Uh oh!

Conversation

codewithdark-git commented May 21, 2025

Uh oh!

Labels

2 participants