17 May 12:55

BenjaminBossan

207376d

v0.11.1 Latest

Latest

Patch release v0.11.1

Fix a bug that could lead to C++ compilation errors after importing PEFT (#1738 #1739).

Full Changelog: v0.11.0...v0.11.1

Assets 2

16 May 09:53

BenjaminBossan

v0.11.0

0649947

v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more

Highlights

New methods

BOFT

Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.

VeRA

If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.

The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.

PiSSA

PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.

Quantization

HQQ

Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.

EETQ

Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.

Show adapter layer and model status

We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.

To use this new feature, call model.get_layer_status() for layer-level information, and model.get_model_status() for model-level information. For more details, check out our docs on layer and model status.

Changes

Edge case of how we deal with `modules_to_save`

We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save. However, this would only add a new ModulesToSaveWrapper instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter, this information was ignored. Now, peft_config.modules_to_save is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.

Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter, if these adapters had modules_to_save, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).

What's Changed

Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
[feat] Add lru_cache to import_utils calls that did not previously have it by @tisles in #1584
fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
MNT: Update GH bug report template by @BenjaminBossan in #1600
fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
Remove duplicated import by @nzw0301 in #1622
FIX: bnb config wrong argument names by @BenjaminBossan in #1603
FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
FIX: Send results to correct channel by @younesbelkada in #1628
FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
itemsize is torch>=2.1, use element_size() by @winglian in #1630
FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
FIX Correctly call element_size by @BenjaminBossan in #1635
fix: allow load_adapter to use different device by @yhZhai in #1631
Adalora deepspeed by @sywangyi in #1625
Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
Don't use deprecated Repository anymore by @Wauplin in #1641
FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
update figure assets of BOFT by @YuliangXiu in #1642
print_trainable_parameters - format % to be sensible by @stas00 in #1648
FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
Remove dreambooth Git link by @charliermarsh in #1660
add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
Update deepspeed.md by @sanghyuk-choi in #1679
ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
FIX: upgrade autoawq to latest version by @younesbelkada in #1684
FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
fix bf16 model type issue for ia3 by @sywangyi in #1634
FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
FEAT Show adapter layer and model status by @BenjaminBossan in #1663
Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
Add LayerNorm tuning model by @DTennant in #1301
FIX Use different doc builder docker image by @BenjaminBossan in #1697
Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
fix the fsdp peft autowrap policy by @pacman100 in #1694
Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
support Cambricon MLUs device by @huismiling in #1687
Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
Fix docs typo by @NielsRogge in #1719
revise run_peft_multigpu.sh by @abzb1 in #1722
Workflow: Add slack messages workflow by @younesbelkada in #1723
DOC Document the PEFT checkpoint for...

Contributors

winglian, charliermarsh, and 24 other contributors

Assets 2

21 Mar 10:20

BenjaminBossan

v0.10.0

8221246

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0, accelerate>=0.28.0, transformers>4.38.2, trl>0.7.11. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.

Layer replication

First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.

Improving DoRA

Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True to your LoraConfig. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d layers, as well as linear layers quantized with bitsandbytes.

Mixed LoRA adapter batches

If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:

output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`

Here, "adapter1" and "adapter2" should be the same name as your corresponding LoRA adapters and "__base__" is a special name that refers to the base model without any adapter. Find more details in our docs.

Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter -- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.

New LoftQ initialization function

We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.

Using the new replace_lora_weights_loftq function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.

Deprecations

The function prepare_model_for_int8_training was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training instead.

What's Changed

Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.

Bump version to 0.9.1.dev0 by @BenjaminBossan in #1517
Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in #1372
FIX [CI / Docker] Follow up from #1481 by @younesbelkada in #1487
CI: temporary disable workflow by @younesbelkada in #1534
FIX [Docs/ bnb / DeepSpeed] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in #1529
Expose bias attribute on tuner layers by @BenjaminBossan in #1530
docs: highlight difference between num_parameters() and get_nb_trainable_parameters() in PEFT by @kmehant in #1531
fix: fail when required args not passed when prompt_tuning_init==TEXT by @kmehant in #1519
Fixed minor grammatical and code bugs by @gremlin97 in #1542
Optimize levenshtein_distance algorithm in peft_lora_seq2seq_accelera… by @SUNGOD3 in #1527
Update prompt_based_methods.md by @insist93 in #1548
FIX Allow AdaLoRA rank to be 0 by @BenjaminBossan in #1540
FIX: Make adaptation prompt CI happy for transformers 4.39.0 by @younesbelkada in #1551
MNT: Use BitsAndBytesConfig as load_in_* is deprecated by @BenjaminBossan in #1552
Add Support for Mistral Model in Llama-Adapter Method by @PrakharSaxena24 in #1433
Add support for layer replication in LoRA by @siddartha-RE in #1368
QDoRA: Support DoRA with BnB quantization by @BenjaminBossan in #1518
Feat: add support for Conv2D DoRA by @sayakpaul in #1516
TST Report slowest tests by @BenjaminBossan in #1556
Changes to support fsdp+qlora and dsz3+qlora by @pacman100 in #1550
Update style with ruff 0.2.2 by @BenjaminBossan in #1565
FEAT Mixing different LoRA adapters in same batch by @BenjaminBossan in #1558
FIX [CI] Fix test docker CI by @younesbelkada in #1535
Fix LoftQ docs and tests by @BenjaminBossan in #1532
More convenient way to initialize LoftQ by @BenjaminBossan in #1543

New Contributors

@DopeorNope-Lee made their first contribution in #1372
@kmehant made their first contribution in #1531
@gremlin97 made their first contribution in #1542
@SUNGOD3 made their first contribution in #1527
@insist93 made their first contribution in #1548
@PrakharSaxena24 made their first contribution in #1433
@siddartha-RE made their first contribution in #1368

Full Changelog: v0.9.0...v0.10.0

Contributors

BenjaminBossan, pacman100, and 9 other contributors

Assets 2

28 Feb 10:37

BenjaminBossan

v0.9.0

7e5335d

v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more

Highlights

New methods for merging LoRA weights together

With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead, this is about merging the weights from different LoRA adapters into a single adapter by calling add_weighted_adapter. This allows you to combine the strength from multiple LoRA adapters into a single adapter, while being faster than activating each of these adapters individually.

Although this feature has already existed in PEFT for some time, we have added new merging methods that promise much better results. The first is based on TIES, the second on DARE and a new one inspired by both called Magnitude Prune. If you haven't tried these new methods, or haven't touched the LoRA weight merging feature at all, you can find more information here:

AWQ and AQLM support for LoRA

Via #1394, we now support AutoAWQ in PEFT. This is a new method for 4bit quantization of model weights.

Similarly, we now support AQLM via #1476. This method allows to quantize weights to as low as 2 bits. Both methods support quantizing nn.Linear layers. To find out more about all the quantization options that work with PEFT, check out our docs here.

Note these integrations do not support merge_and_unload() yet, meaning for inference you need to always attach the adapter weights into the base model

DoRA support

We now support Weight-Decomposed Low-Rank Adaptation aka DoRA via #1474. This new method is builds on top of LoRA and has shown very promising results. Especially at lower ranks (e.g. r=8), it should perform much better than LoRA. Right now, only non-quantized nn.Linear layers are supported. If you'd like to give it a try, just pass use_dora=True to your LoraConfig and you're good to go.

Documentation

Thanks to @stevhliu and many other contributors, there have been big improvements to the documentation. You should find it more organized and more up-to-date. Our DeepSpeed and FSDP guides have also been much improved.

Check out our improved docs if you haven't already!

Development

If you're implementing custom adapter layers, for instance a custom LoraLayer, note that all subclasses should now implement update_layer -- unless they want to use the default method by the parent class. In particular, this means you should no longer use different method names for the subclass, like update_layer_embedding. Also, we generally don't permit ranks (r) of 0 anymore. For more, see this PR.

Developers should have an easier time now since we fully embrace ruff. If you're the type of person who forgets to call make style before pushing to a PR, consider adding a pre-commit hook. Tests are now a bit less verbose by using plain asserts and generally embracing pytest features more fully. All of this comes thanks to @akx.

What's Changed

On top of these changes, we have added a lot of small changes since the last release, check out the full changes below. As always, we had a lot of support by many contributors, you're awesome!

Release patch version 0.8.2 by @pacman100 in #1428
[docs] Polytropon API by @stevhliu in #1422
Fix MatMul8bitLtBackward view issue by @younesbelkada in #1425
Fix typos by @szepeviktor in #1435
Fixed saving for models that don't have _name_or_path in config by @kovalexal in #1440
[docs] README update by @stevhliu in #1411
[docs] Doc maintenance by @stevhliu in #1394
[core/TPLinear] Fix breaking change by @younesbelkada in #1439
Renovate quality tools by @akx in #1421
[Docs] call set_adapters() after add_weighted_adapter by @sayakpaul in #1444
MNT: Check only selected directories with ruff by @BenjaminBossan in #1446
TST: Improve test coverage by skipping fewer tests by @BenjaminBossan in #1445
Update Dockerfile to reflect how to compile bnb from source by @younesbelkada in #1437
[docs] Lora-like guides by @stevhliu in #1371
[docs] IA3 by @stevhliu in #1373
Add docstrings for set_adapter and keep frozen by @EricLBuehler in #1447
Add new merging methods by @pacman100 in #1364
FIX Loading with AutoPeftModel.from_pretrained by @BenjaminBossan in #1449
Support modules_to_save config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. by @pacman100 in #1450
FIX Honor HF_HUB_OFFLINE mode if set by user by @BenjaminBossan in #1454
[docs] Remove iframe by @stevhliu in #1456
[docs] Docstring typo by @stevhliu in #1455
[core / get_peft_state_dict] Ignore all exceptions to avoid unexpected errors by @younesbelkada in #1458
[ Adaptation Prompt] Fix llama rotary embedding issue with transformers main by @younesbelkada in #1459
[CI] Add CI tests on transformers main to catch early bugs by @younesbelkada in #1461
Use plain asserts in tests by @akx in #1448
Add default IA3 target modules for Mixtral by @arnavgarg1 in #1376
add magnitude_prune merging method by @pacman100 in #1466
[docs] Model merging by @stevhliu in #1423
Adds an example notebook for showing multi-adapter weighted inference by @sayakpaul in #1471
Make tests succeed more on MPS by @akx in #1463
[CI] Fix adaptation prompt CI on transformers main by @younesbelkada in #1465
Update docstring at peft_types.py by @eduardozamudio in #1475
FEAT: add awq suppot in PEFT by @younesbelkada in #1399
Add pre-commit configuration by @akx in #1467
ENH [CI] Run tests only when relevant files are modified by @younesbelkada in #1482
FIX [CI / bnb] Fix failing bnb workflow by @younesbelkada in #1480
FIX [PromptTuning] Simple fix for transformers >= 4.38 by @younesbelkada in #1484
FIX: Multitask prompt tuning with other tuning init by @BenjaminBossan in #1144
previous_dtype is now inferred from F.linear's result output type. by @MFajcik in #1010
ENH: [CI / Docker]: Create a workflow to temporarly build docker images in case dockerfiles are modified by @younesbelkada in #1481
Fix issue with unloading double wrapped modules by @BenjaminBossan in #1490
FIX: [CI / Adaptation Prompt] Fix CI on transformers main by @younesbelkada in #1493
Update peft_bnb_whisper_large_v2_training.ipynb: Fix a typo by @martin0258 in #1494
covert SVDLinear dtype by @PHOSPHENES8 in #1495
Raise error on wrong type for to modules_to_save by @BenjaminBossan in #1496
AQLM support for LoRA by @BlackSamorez in #1476
Allow trust_remote_code for tokenizers when loading AutoPeftModels by @OfficialDelta in https://...

Contributors

akx, szepeviktor, and 14 other contributors

Assets 2

01 Feb 14:16

pacman100

v0.8.2

e37bff6

Release v0.8.2

What's Changed

Release v0.8.2.dev0 by @pacman100 in #1416
Add IA3 Modules for Phi by @arnavgarg1 in #1407
Update custom_models.md by @boyufan in #1409
Add positional args to PeftModelForCausalLM.generate by @SumanthRH in #1393
[Hub] fix: subfolder existence check by @sayakpaul in #1417
FIX: Make merging of adapter weights idempotent by @BenjaminBossan in #1355
[core] fix critical bug in diffusers by @younesbelkada in #1427

New Contributors

@boyufan made their first contribution in #1409

Full Changelog: v0.8.1...v0.8.2

Contributors

BenjaminBossan, pacman100, and 5 other contributors

Assets 2

30 Jan 10:48

pacman100

v0.8.1

5e4aa7e

Patch Release v0.8.1

This is a small patch release of PEFT that should:

Fix breaking change related to support for saving resized embedding layers and Diffusers models. Contributed by @younesbelkada in #1414

What's Changed

Release 0.8.1.dev0 by @pacman100 in #1412
Fix breaking change by @younesbelkada in #1414
Patch Release v0.8.1 by @pacman100 in #1415

Full Changelog: v0.8.0...v0.8.1

Contributors

pacman100 and younesbelkada

Assets 2

30 Jan 06:59

pacman100

v0.8.0

30889ef

v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more

Highlights

Poly PEFT method

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.

Add Poly by @TaoSunVoyage in #1129

LoRA improvements

Now, you can specify all-linear to target_modules param of LoraConfig to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers

Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295

Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:

Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.

save the embeddings even when they aren't targetted but resized by @pacman100 in #1383

New option use_rslora in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).

Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244

Documentation improvements

Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
DOC: Update docstring for the config classes by @BenjaminBossan in #1343
LoftQ: edit README.md and example files by @yxli2123 in #1276
[Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
[docs] Docstring link by @stevhliu in #1356
QOL improvements and doc updates by @pacman100 in #1318
Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
DOC: Improve target modules description by @BenjaminBossan in #1290
DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
Improve documentation for the all-linear flag by @SumanthRH in #1357
Fix various typos in LoftQ docs. by @arnavgarg1 in #1408

What's Changed

Bump version to 0.7.2.dev0 post release by @BenjaminBossan in #1258
FIX Error in log_reports.py by @BenjaminBossan in #1261
Fix ModulesToSaveWrapper getattr by @zhangsheng377 in #1238
TST: Revert device_map for AdaLora 4bit GPU test by @BenjaminBossan in #1266
remove a duplicated description in peft BaseTuner by @butyuhao in #1271
Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
feat: add apple silicon GPU acceleration by @NripeshN in #1217
LoftQ: Allow quantizing models loaded on the CPU for LoftQ initialization by @hiyouga in #1256
LoftQ: edit README.md and example files by @yxli2123 in #1276
TST: Extend LoftQ tests to check CPU initialization by @BenjaminBossan in #1274
Refactor and a couple of fixes for adapter layer updates by @BenjaminBossan in #1268
[Tests] Add bitsandbytes installed from source on new docker images by @younesbelkada in #1275
TST: Enable LoftQ 8bit tests by @BenjaminBossan in #1279
[bnb] Add bnb nightly workflow by @younesbelkada in #1282
Fixed several errors in StableDiffusion adapter conversion script by @kovalexal in #1281
[docs] Concept guides by @stevhliu in #1269
DOC: Improve target modules description by @BenjaminBossan in #1290
[bnb-nightly] Address final comments by @younesbelkada in #1287
[BNB] Fix bnb dockerfile for latest version by @SunMarc in #1291
fix fsdp auto wrap policy by @pacman100 in #1302
[BNB] fix dockerfile for single gpu by @SunMarc in #1305
Fix bnb lora layers not setting active adapter by @tdrussell in #1294
Mistral IA3 config defaults by @pacman100 in #1316
fix the embedding saving for adaption prompt by @pacman100 in #1314
fix diffusers tests by @pacman100 in #1317
FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x by @BenjaminBossan in #1320
Extend merge_and_unload to offloaded models by @blbadger in #1190
Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
Refactor dispatching logic of LoRA layers by @BenjaminBossan in #1319
Fix bug when load the prompt tuning in inference. by @yileld in #1333
DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
ENH: Add attribute to show targeted module names by @BenjaminBossan in #1330
fix some args desc by @zspo in #1338
Fix logic in target module finding by @s-k-yx in #1263
Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
DOC: Update docstring for the config classes by @BenjaminBossan in #1343
fix prepare_inputs_for_generation logic for Prompt Learning methods by @pacman100 in #1352
QOL improvements and doc updates by @pacman100 in #1318
New transformers caching ETA now v4.38 by @BenjaminBossan in #1348
FIX Setting active adapter for quantized layers by @BenjaminBossan in #1347
DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
[Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
Add Poly by @TaoSunVoyage in #1129
[docs] Docstring link by @stevhliu in #1356
Added missing getattr dunder methods for mixed model by @kovalexal in #1365
Handle resizing of embedding layers for AutoPeftModel by @pacman100 in #1367
account for the new merged/unmerged weight to perform the quantization again by @pacman100 in #1370
add mixtral in LoRA mapping by @younesbelkada in https://github.com/h...

Contributors

kovalexal, zhangsheng377, and 21 other contributors

Assets 2

12 Dec 17:22

BenjaminBossan

v0.7.1

67a0800

v0.7.1 patch release

This is a small patch release of PEFT that should handle:

Issues with loading multiple adapters when using quantized models (#1243)
Issues with transformers v4.36 and some prompt learning methods (#1252)

What's Changed

[docs] OFT by @stevhliu in #1221
Bump version to 0.7.1.dev0 post release by @BenjaminBossan in #1227
Don't set config attribute on custom models by @BenjaminBossan in #1200
TST: Run regression test in nightly test runner by @BenjaminBossan in #1233
Lazy import of bitsandbytes by @BenjaminBossan in #1230
FIX: Pin bitsandbytes to <0.41.3 temporarily by @BenjaminBossan in #1234
[docs] PeftConfig and PeftModel by @stevhliu in #1211
TST: Add tolerance for regression tests by @BenjaminBossan in #1241
Bnb integration test tweaks by @Titus-von-Koeller in #1242
[docs] PEFT integrations by @stevhliu in #1224
Revert "FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)" by @Titus-von-Koeller in #1250
Fix model argument issue (#1198) by @ngocbh in #1205
TST: Add tests for 4bit LoftQ by @BenjaminBossan in #1208
[docs] Quantization by @stevhliu in #1236
FIX: Truncate slack message to not exceed 3000 chars by @BenjaminBossan in #1251
Issue with transformers 4.36 by @BenjaminBossan in #1252
Fix: Multiple adapters with bnb layers by @BenjaminBossan in #1243
Release: 0.7.1 by @BenjaminBossan in #1257

New Contributors

@Titus-von-Koeller made their first contribution in #1242
@ngocbh made their first contribution in #1205

Full Changelog: v0.7.0...v0.7.1

Contributors

BenjaminBossan, Titus-von-Koeller, and 2 other contributors

Assets 2

06 Dec 16:13

BenjaminBossan

v0.7.0

2665f80

v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more

Highlights

Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.

Other notable additions

It is now possible to choose which adapters are merged when calling merge (#1132)
IA³ now supports adapter deletion, by @alexrs (#1153)
A new initialization method for LoRA has been added, "gaussian" (#1189)
When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.

Migration to v0.7.0

Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file adapter_model.bin, calling save_pretrained now creates adapter_model.safetensors. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub.
When merging multiple LoRA adapter weights together using add_weighted_adapter with the option combination_type="linear", the scaling of the adapter weights is now performed differently, leading to improved results.
There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g. peft.lora.Linear is no longer a subclass of nn.Linear, so isinstance checks may need updating). Also, to retrieve the original weight of an adapted layer, now use self.get_base_layer().weight, not self.weight (same for bias).

What's Changed

As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.

After release: Bump version to 0.7.0.dev0 by @BenjaminBossan in #1074
FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084
Improve documentation for IA³ by @SumanthRH in #984
[Docker] Update Dockerfile to force-use transformers main by @younesbelkada in #1085
Update the release checklist by @BenjaminBossan in #1075
fix-gptq-training by @SunMarc in #1086
fix the failing CI tests by @pacman100 in #1094
Fix f-string in import_utils by @KCFindstr in #1091
Fix IA3 config for Falcon models by @SumanthRH in #1007
FIX: Failing nightly CI tests due to IA3 config by @BenjaminBossan in #1100
[core] Fix safetensors serialization for shared tensors by @younesbelkada in #1101
Change to 0.6.1.dev0 by @younesbelkada in #1102
Release: 0.6.1 by @younesbelkada in #1103
set dev version by @younesbelkada in #1104
avoid unnecessary import by @winglian in #1109
Refactor adapter deletion by @BenjaminBossan in #1105
Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
Fix import issue transformers with id_tensor_storage by @younesbelkada in #1116
Correctly deal with ModulesToSaveWrapper when using Low-level API by @younesbelkada in #1112
fix doc typo by @coding-famer in #1121
Release: v0.6.2 by @pacman100 in #1125
Release: v0.6.3.dev0 by @pacman100 in #1128
FIX: Adding 2 adapters when target_modules is a str fails by @BenjaminBossan in #1111
Prompt tuning: Allow to pass additional args to AutoTokenizer.from_pretrained by @BenjaminBossan in #1053
Fix: TorchTracemalloc ruins Windows performance by @lukaskuhn-lku in #1126
TST: Improve requires grad testing: by @BenjaminBossan in #1131
FEAT: Make safe serialization the default one by @younesbelkada in #1088
FEAT: Merging only specified adapter_names when calling merge by @younesbelkada in #1132
Refactor base layer pattern by @BenjaminBossan in #1106
[Tests] Fix daily CI by @younesbelkada in #1136
[core / LoRA] Add adapter_names in bnb layers by @younesbelkada in #1139
[Tests] Do not stop tests if a job failed by @younesbelkada in #1141
CI Add Python 3.11 to test matrix by @BenjaminBossan in #1143
FIX: A few issues with AdaLora, extending GPU tests by @BenjaminBossan in #1146
Use huggingface_hub.file_exists instead of custom helper by @Wauplin in #1145
Delete IA3 adapter by @alexrs in #1153
[Docs fix] Relative path issue by @mishig25 in #1157
Dataset was loaded twice in 4-bit finetuning script by @lukaskuhn-lku in #1164
fix add_weighted_adapter method by @pacman100 in #1169
(minor) correct type annotation by @vwxyzjn in #1166
Update release checklist about release notes by @BenjaminBossan in #1170
[docs] Migrate doc files to Markdown by @stevhliu in #1171
Fix dockerfile build by @younesbelkada in #1177
FIX: Wrong use of base layer by @BenjaminBossan in #1183
[Tests] Migrate to AWS runners by @younesbelkada in #1185
Fix code example in quicktour.md by @merveenoyan in #1181
DOC Update a few places in the README by @BenjaminBossan in #1152
Fix issue where you cannot call PeftModel.from_pretrained with a private adapter by @elyxlz in #1076
Added lora support for phi by @umarbutler in #1186
add options to save or push model by @callanwu in #1159
ENH: Different initialization methods for LoRA by @BenjaminBossan in #1189
Training PEFT models with new tokens being added to the embedding layers and tokenizer by @pacman100 in #1147
LoftQ: Add LoftQ method integrated into LoRA. Add example code for LoftQ usage. by @yxli2123 in #1150
Parallel linear Lora by @zhangsheng377 in #1092
[Feature] Support OFT by @okotaku in #1160
Mixed adapter models by @BenjaminBossan in #1163
[DOCS] README.md by @Akash190104 in #1054
Fix parallel linear lora by @zhangsheng377 in #1202
ENH: Enable OFT adapter for mixed adapter models by @BenjaminBossan in #1204
DOC: Update & improve docstrings and type annotations for common methods and classes by @BenjaminBossan in https://g...

Contributors

winglian, alexrs, and 20 other contributors

Assets 2

14 Nov 05:55

pacman100

v0.6.2

32357c2

v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API

This patch release refactors the adapter deletion API and fixes to ModulesToSaveWrapper when using Low-level API.

Refactor adapter deletion

Refactor adapter deletion by @BenjaminBossan in #1105

Fix `ModulesToSaveWrapper` when using Low-level API

Correctly deal with ModulesToSaveWrapper when using Low-level API by @younesbelkada in #1112

What's Changed

Release: 0.6.1 by @younesbelkada in #1103
set dev version by @younesbelkada in #1104
avoid unnecessary import by @winglian in #1109
Refactor adapter deletion by @BenjaminBossan in #1105
Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
Fix import issue transformers with id_tensor_storage by @younesbelkada in #1116
Correctly deal with ModulesToSaveWrapper when using Low-level API by @younesbelkada in #1112
fix doc typo by @coding-famer in #1121

New Contributors

@winglian made their first contribution in #1109
@lukaskuhn-lku made their first contribution in #1107
@coding-famer made their first contribution in #1121

Full Changelog: v0.6.1...v0.6.2

Contributors

winglian, BenjaminBossan, and 3 other contributors

Assets 2

Releases: huggingface/peft

v0.11.1

Patch release v0.11.1

v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more

Highlights

New methods

BOFT

VeRA

PiSSA

Quantization

HQQ

EETQ

Show adapter layer and model status

Changes

Edge case of how we deal with modules_to_save

What's Changed

Contributors

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

Layer replication

Improving DoRA

Mixed LoRA adapter batches

New LoftQ initialization function

Deprecations

What's Changed

New Contributors

Contributors

v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more

Highlights

New methods for merging LoRA weights together

AWQ and AQLM support for LoRA

DoRA support

Documentation

Development

What's Changed

Contributors

Release v0.8.2

What's Changed

New Contributors

Contributors

Patch Release v0.8.1

What's Changed

Contributors

v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more

Highlights

Poly PEFT method

LoRA improvements

Documentation improvements

What's Changed

Contributors

v0.7.1 patch release

What's Changed

New Contributors

Contributors

v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more

Highlights

Other notable additions

Migration to v0.7.0

What's Changed

Contributors

v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API

Refactor adapter deletion

Fix ModulesToSaveWrapper when using Low-level API

What's Changed

What's Changed

New Contributors

Contributors

Edge case of how we deal with `modules_to_save`

Fix `ModulesToSaveWrapper` when using Low-level API