GPTQ Integration

Now, you can finetune GPTQ quantized models using PEFT. Here are some examples of how to use PEFT with a GPTQ model: colab notebook and finetuning script.

GPTQ Integration by @SunMarc in https://github.com/huggingface/peft/pull/771

Low-level API

Enables users and developers to use PEFT as a utility library, at least for injectable adapters (LoRA, IA3, AdaLoRA). It exposes an API to modify the model in place to inject the new layers into the model.

[core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada https://github.com/huggingface/peft/pull/749
[Low-level-API] Add docs about LLAPI by @younesbelkada in https://github.com/huggingface/peft/pull/836

Support for XPU and NPU devices

Leverage the support for more devices for loading and fine-tuning PEFT adapters.

Support XPU adapter loading by @abhilash1910 in https://github.com/huggingface/peft/pull/737
Support Ascend NPU adapter loading by @statelesshz in https://github.com/huggingface/peft/pull/772

Mix-and-match LoRAs

Stable support and new ways of merging multiple LoRAs. There are currently 3 ways of merging loras supported: linear, svd and cat.

Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in https://github.com/huggingface/peft/pull/817

What's Changed

Release version 0.5.0.dev0 by @pacman100 in https://github.com/huggingface/peft/pull/717
Fix subfolder issue by @younesbelkada in https://github.com/huggingface/peft/pull/721
Add falcon to officially supported LoRA & IA3 modules by @younesbelkada in https://github.com/huggingface/peft/pull/722
revert change by @pacman100 in https://github.com/huggingface/peft/pull/731
fix(pep561): include packaging type information by @aarnphm in https://github.com/huggingface/peft/pull/729
[Llama2] Add disabling TP behavior by @younesbelkada in https://github.com/huggingface/peft/pull/728
[Patch] patch trainable params for 4bit layers by @younesbelkada in https://github.com/huggingface/peft/pull/733
FIX: Warning when initializing prompt encoder by @BenjaminBossan in https://github.com/huggingface/peft/pull/716
ENH: Warn when disabling adapters and bias != 'none' by @BenjaminBossan in https://github.com/huggingface/peft/pull/741
FIX: Disabling adapter works with modules_to_save by @BenjaminBossan in https://github.com/huggingface/peft/pull/736
Updated Example in Class:LoraModel by @TianyiPeng in https://github.com/huggingface/peft/pull/672
[AdaLora] Fix adalora inference issue by @younesbelkada in https://github.com/huggingface/peft/pull/745
Add btlm to officially supported LoRA by @Trapper4888 in https://github.com/huggingface/peft/pull/751
[ModulesToSave] add correct hook management for modules to save by @younesbelkada in https://github.com/huggingface/peft/pull/755
Example notebooks for LoRA with custom models by @BenjaminBossan in https://github.com/huggingface/peft/pull/724
Add tests for AdaLoRA, fix a few bugs by @BenjaminBossan in https://github.com/huggingface/peft/pull/734
Add progressbar unload/merge by @BramVanroy in https://github.com/huggingface/peft/pull/753
Support XPU adapter loading by @abhilash1910 in https://github.com/huggingface/peft/pull/737
Support Ascend NPU adapter loading by @statelesshz in https://github.com/huggingface/peft/pull/772
Allow passing inputs_embeds instead of input_ids by @BenjaminBossan in https://github.com/huggingface/peft/pull/757
[core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada in https://github.com/huggingface/peft/pull/749
Add adapter error handling by @BenjaminBossan in https://github.com/huggingface/peft/pull/800
add lora default target module for codegen by @sywangyi in https://github.com/huggingface/peft/pull/787
DOC: Update docstring of PeftModel.from_pretrained by @BenjaminBossan in https://github.com/huggingface/peft/pull/799
fix crash when using torch.nn.DataParallel for LORA inference by @sywangyi in https://github.com/huggingface/peft/pull/805
Peft model signature by @kiansierra in https://github.com/huggingface/peft/pull/784
GPTQ Integration by @SunMarc in https://github.com/huggingface/peft/pull/771
Only fail quantized Lora unload when actually merging by @BlackHC in https://github.com/huggingface/peft/pull/822
Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in https://github.com/huggingface/peft/pull/817
TST: add test about loading custom models by @BenjaminBossan in https://github.com/huggingface/peft/pull/827
Fix unbound error in ia3.py by @His-Wardship in https://github.com/huggingface/peft/pull/794
[Docker] Fix gptq dockerfile by @younesbelkada in https://github.com/huggingface/peft/pull/835
[Tests] Add 4bit slow training tests by @younesbelkada in https://github.com/huggingface/peft/pull/834
[Low-level-API] Add docs about LLAPI by @younesbelkada in https://github.com/huggingface/peft/pull/836
Type annotation fix by @vwxyzjn in https://github.com/huggingface/peft/pull/840

New Contributors

@TianyiPeng made their first contribution in https://github.com/huggingface/peft/pull/672
@Trapper4888 made their first contribution in https://github.com/huggingface/peft/pull/751
@abhilash1910 made their first contribution in https://github.com/huggingface/peft/pull/737
@statelesshz made their first contribution in https://github.com/huggingface/peft/pull/772
@kiansierra made their first contribution in https://github.com/huggingface/peft/pull/784
@BlackHC made their first contribution in https://github.com/huggingface/peft/pull/822
@His-Wardship made their first contribution in https://github.com/huggingface/peft/pull/794
@vwxyzjn made their first contribution in https://github.com/huggingface/peft/pull/840

Full Changelog: https://github.com/huggingface/peft/compare/v0.4.0...v0.5.0