Feat: Introduce QuantizerFactory API and Refactor Quantization Workflow #4
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
This commit significantly refactors the model quantization API to enhance your experience and streamline the quantization process.
Key Changes:
New
QuantizerFactoryAPI:QuantizerFactory.quantize_from_pretrainedinquantllm/api.py. This method serves as the new primary entry point for model quantization.Core Quantizer Updates:
BaseQuantizerand its subclasses (AWQ, GPTQ, GGUF) now support initialization with either a model name/path string or a pre-loadedPreTrainedModelinstance.BaseQuantizerhandles the loading of the model and its associated tokenizer when a name/path is provided.NameErrorformove_to_deviceingguf.py,awq.py, andgptq.pyby ensuring correct imports.Quantization Configuration in Model Config:
model.config.quantization_config. This ensures that this critical information is stored in the model'sconfig.jsonupon saving, aiding reproducibility and Hugging Face Hub integration.Documentation Overhaul:
docs/api_reference/quantization.rst) has been rewritten to focus on the newQuantizerFactoryAPI, providing clear examples and parameter explanations.New Example Script:
examples/01_quantize_and_push_to_hub.pydemonstrating an end-to-end workflow: quantizing a model using the new factory, saving it locally, and pushing it to the Hugging Face Hub.Unit Test Refactoring:
QuantizerFactoryAPI.quantllm/quant/tests/test_api.py, covers various methods, parameters, CPU/GPU execution, and GGUF-specific features.This refactoring aims to provide a more intuitive, robust, and streamlined experience for you when quantizing models with this library.