GPTQ Quantization, Low-level API
Now, you can finetune GPTQ quantized models using PEFT. Here are some examples of how to use PEFT with a GPTQ model: colab notebook and finetuning script.
Enables users and developers to use PEFT as a utility library, at least for injectable adapters (LoRA, IA3, AdaLoRA). It exposes an API to modify the model in place to inject the new layers into the model.
core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada https://github.com/huggingface/peft/pull/749Low-level-API] Add docs about LLAPI by @younesbelkada in https://github.com/huggingface/peft/pull/836Leverage the support for more devices for loading and fine-tuning PEFT adapters.
Stable support and new ways of merging multiple LoRAs. There are currently 3 ways of merging loras supported: linear, svd and cat.
Llama2] Add disabling TP behavior by @younesbelkada in https://github.com/huggingface/peft/pull/728Patch] patch trainable params for 4bit layers by @younesbelkada in https://github.com/huggingface/peft/pull/733AdaLora] Fix adalora inference issue by @younesbelkada in https://github.com/huggingface/peft/pull/745ModulesToSave] add correct hook management for modules to save by @younesbelkada in https://github.com/huggingface/peft/pull/755core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada in https://github.com/huggingface/peft/pull/749Docker] Fix gptq dockerfile by @younesbelkada in https://github.com/huggingface/peft/pull/835Tests] Add 4bit slow training tests by @younesbelkada in https://github.com/huggingface/peft/pull/834Low-level-API] Add docs about LLAPI by @younesbelkada in https://github.com/huggingface/peft/pull/836Full Changelog: https://github.com/huggingface/peft/compare/v0.4.0...v0.5.0
Fetched April 7, 2026