Skip to content

Releases: huggingface/peft

v0.11.1

17 May 12:55
Compare
Choose a tag to compare

Patch release v0.11.1

Fix a bug that could lead to C++ compilation errors after importing PEFT (#1738 #1739).

Full Changelog: v0.11.0...v0.11.1

v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more

16 May 09:53
0649947
Compare
Choose a tag to compare

Highlights

peft-v0 11 0

New methods

BOFT

Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.

VeRA

If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.

The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.

PiSSA

PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.

Quantization

HQQ

Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.

EETQ

Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.

Show adapter layer and model status

We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.

To use this new feature, call model.get_layer_status() for layer-level information, and model.get_model_status() for model-level information. For more details, check out our docs on layer and model status.

Changes

Edge case of how we deal with modules_to_save

We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save. However, this would only add a new ModulesToSaveWrapper instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter, this information was ignored. Now, peft_config.modules_to_save is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.

Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter, if these adapters had modules_to_save, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).

What's Changed

Read more

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

21 Mar 10:20
8221246
Compare
Choose a tag to compare

Highlights

image

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0, accelerate>=0.28.0, transformers>4.38.2, trl>0.7.11. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.

Layer replication

First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.

Improving DoRA

Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True to your LoraConfig. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d layers, as well as linear layers quantized with bitsandbytes.

Mixed LoRA adapter batches

If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:

output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`

Here, "adapter1" and "adapter2" should be the same name as your corresponding LoRA adapters and "__base__" is a special name that refers to the base model without any adapter. Find more details in our docs.

Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter -- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.

New LoftQ initialization function

We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.

Using the new replace_lora_weights_loftq function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.

Deprecations

The function prepare_model_for_int8_training was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training instead.

What's Changed

Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.

New Contributors

Full Changelog: v0.9.0...v0.10.0

v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more

28 Feb 10:37
7e5335d
Compare
Choose a tag to compare

Highlights

New methods for merging LoRA weights together

cat_teapot

With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead, this is about merging the weights from different LoRA adapters into a single adapter by calling add_weighted_adapter. This allows you to combine the strength from multiple LoRA adapters into a single adapter, while being faster than activating each of these adapters individually.

Although this feature has already existed in PEFT for some time, we have added new merging methods that promise much better results. The first is based on TIES, the second on DARE and a new one inspired by both called Magnitude Prune. If you haven't tried these new methods, or haven't touched the LoRA weight merging feature at all, you can find more information here:

AWQ and AQLM support for LoRA

Via #1394, we now support AutoAWQ in PEFT. This is a new method for 4bit quantization of model weights.

Screenshot 2024-02-28 at 09 41 40

Similarly, we now support AQLM via #1476. This method allows to quantize weights to as low as 2 bits. Both methods support quantizing nn.Linear layers. To find out more about all the quantization options that work with PEFT, check out our docs here.

Screenshot 2024-02-28 at 09 42 22

Note these integrations do not support merge_and_unload() yet, meaning for inference you need to always attach the adapter weights into the base model

DoRA support

We now support Weight-Decomposed Low-Rank Adaptation aka DoRA via #1474. This new method is builds on top of LoRA and has shown very promising results. Especially at lower ranks (e.g. r=8), it should perform much better than LoRA. Right now, only non-quantized nn.Linear layers are supported. If you'd like to give it a try, just pass use_dora=True to your LoraConfig and you're good to go.

Documentation

Thanks to @stevhliu and many other contributors, there have been big improvements to the documentation. You should find it more organized and more up-to-date. Our DeepSpeed and FSDP guides have also been much improved.

Check out our improved docs if you haven't already!

Development

If you're implementing custom adapter layers, for instance a custom LoraLayer, note that all subclasses should now implement update_layer -- unless they want to use the default method by the parent class. In particular, this means you should no longer use different method names for the subclass, like update_layer_embedding. Also, we generally don't permit ranks (r) of 0 anymore. For more, see this PR.

Developers should have an easier time now since we fully embrace ruff. If you're the type of person who forgets to call make style before pushing to a PR, consider adding a pre-commit hook. Tests are now a bit less verbose by using plain asserts and generally embracing pytest features more fully. All of this comes thanks to @akx.

What's Changed

On top of these changes, we have added a lot of small changes since the last release, check out the full changes below. As always, we had a lot of support by many contributors, you're awesome!

Read more

Release v0.8.2

01 Feb 14:16
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.8.1...v0.8.2

Patch Release v0.8.1

30 Jan 10:48
5e4aa7e
Compare
Choose a tag to compare

This is a small patch release of PEFT that should:

  • Fix breaking change related to support for saving resized embedding layers and Diffusers models. Contributed by @younesbelkada in #1414

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more

30 Jan 06:59
30889ef
Compare
Choose a tag to compare

Highlights

Poly PEFT method

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.

LoRA improvements

Now, you can specify all-linear to target_modules param of LoraConfig to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers

  • Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295

Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:

  1. Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
  2. Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
  3. Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
    A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.
  • save the embeddings even when they aren't targetted but resized by @pacman100 in #1383

New option use_rslora in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).

Documentation improvements

  • Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
  • Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
  • DOC: Update docstring for the config classes by @BenjaminBossan in #1343
  • LoftQ: edit README.md and example files by @yxli2123 in #1276
  • [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
  • DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
  • [docs] Docstring link by @stevhliu in #1356
  • QOL improvements and doc updates by @pacman100 in #1318
  • Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
  • DOC: Improve target modules description by @BenjaminBossan in #1290
  • DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
  • DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
  • Improve documentation for the all-linear flag by @SumanthRH in #1357
  • Fix various typos in LoftQ docs. by @arnavgarg1 in #1408

What's Changed

Read more

v0.7.1 patch release

12 Dec 17:22
67a0800
Compare
Choose a tag to compare

This is a small patch release of PEFT that should handle:

  • Issues with loading multiple adapters when using quantized models (#1243)
  • Issues with transformers v4.36 and some prompt learning methods (#1252)

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.7.1

v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more

06 Dec 16:13
2665f80
Compare
Choose a tag to compare

Highlights

  • Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
  • Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
  • LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.

Other notable additions

  • It is now possible to choose which adapters are merged when calling merge (#1132)
  • IA³ now supports adapter deletion, by @alexrs (#1153)
  • A new initialization method for LoRA has been added, "gaussian" (#1189)
  • When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
  • It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
  • We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.

Migration to v0.7.0

  • Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file adapter_model.bin, calling save_pretrained now creates adapter_model.safetensors. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub.
  • When merging multiple LoRA adapter weights together using add_weighted_adapter with the option combination_type="linear", the scaling of the adapter weights is now performed differently, leading to improved results.
  • There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g. peft.lora.Linear is no longer a subclass of nn.Linear, so isinstance checks may need updating). Also, to retrieve the original weight of an adapted layer, now use self.get_base_layer().weight, not self.weight (same for bias).

What's Changed

As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.

Read more

v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API

14 Nov 05:55
Compare
Choose a tag to compare

This patch release refactors the adapter deletion API and fixes to ModulesToSaveWrapper when using Low-level API.

Refactor adapter deletion

Fix ModulesToSaveWrapper when using Low-level API

What's Changed

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.6.2