Releases: huggingface/peft
v0.11.1
v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more
Highlights
New methods
BOFT
Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.
VeRA
If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.
The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.
PiSSA
PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.
Quantization
HQQ
Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.
EETQ
Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.
Show adapter layer and model status
We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.
To use this new feature, call model.get_layer_status()
for layer-level information, and model.get_model_status()
for model-level information. For more details, check out our docs on layer and model status.
Changes
Edge case of how we deal with modules_to_save
We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save
. However, this would only add a new ModulesToSaveWrapper
instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter
, this information was ignored. Now, peft_config.modules_to_save
is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.
Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter
, if these adapters had modules_to_save
, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).
What's Changed
- Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
- FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
- FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
- DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
- Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
- [feat] Add
lru_cache
toimport_utils
calls that did not previously have it by @tisles in #1584 - fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
- MNT: Update GH bug report template by @BenjaminBossan in #1600
- fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
- FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
- Remove duplicated import by @nzw0301 in #1622
- FIX: bnb config wrong argument names by @BenjaminBossan in #1603
- FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
- FIX: Send results to correct channel by @younesbelkada in #1628
- FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
- itemsize is torch>=2.1, use element_size() by @winglian in #1630
- FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
- FIX Correctly call element_size by @BenjaminBossan in #1635
- fix: allow load_adapter to use different device by @yhZhai in #1631
- Adalora deepspeed by @sywangyi in #1625
- Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
- Don't use deprecated
Repository
anymore by @Wauplin in #1641 - FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
- update figure assets of BOFT by @YuliangXiu in #1642
- print_trainable_parameters - format
%
to be sensible by @stas00 in #1648 - FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
- Remove
dreambooth
Git link by @charliermarsh in #1660 - add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
- Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
- Update deepspeed.md by @sanghyuk-choi in #1679
- ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
- FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
- FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
- FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
- FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
- FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
- FIX: upgrade autoawq to latest version by @younesbelkada in #1684
- FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
- fix bf16 model type issue for ia3 by @sywangyi in #1634
- FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
- FEAT Show adapter layer and model status by @BenjaminBossan in #1663
- Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
- TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
- Add LayerNorm tuning model by @DTennant in #1301
- FIX Use different doc builder docker image by @BenjaminBossan in #1697
- Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
- fix the fsdp peft autowrap policy by @pacman100 in #1694
- Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
- FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
- support Cambricon MLUs device by @huismiling in #1687
- Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
- Fix docs typo by @NielsRogge in #1719
- revise run_peft_multigpu.sh by @abzb1 in #1722
- Workflow: Add slack messages workflow by @younesbelkada in #1723
- DOC Document the PEFT checkpoint for...
v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA
Highlights
Support for QLoRA with DeepSpeed ZeRO3 and FSDP
We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0
, accelerate>=0.28.0
, transformers>4.38.2
, trl>0.7.11
. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.
Layer replication
First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.
Improving DoRA
Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True
to your LoraConfig
. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d
layers, as well as linear layers quantized with bitsandbytes.
Mixed LoRA adapter batches
If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:
output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`
Here, "adapter1"
and "adapter2"
should be the same name as your corresponding LoRA adapters and "__base__"
is a special name that refers to the base model without any adapter. Find more details in our docs.
Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter
-- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.
New LoftQ initialization function
We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.
Using the new replace_lora_weights_loftq
function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.
Deprecations
The function prepare_model_for_int8_training
was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training
instead.
What's Changed
Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.
- Bump version to 0.9.1.dev0 by @BenjaminBossan in #1517
- Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in #1372
- FIX [
CI
/Docker
] Follow up from #1481 by @younesbelkada in #1487 - CI: temporary disable workflow by @younesbelkada in #1534
- FIX [
Docs
/bnb
/DeepSpeed
] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in #1529 - Expose bias attribute on tuner layers by @BenjaminBossan in #1530
- docs: highlight difference between
num_parameters()
andget_nb_trainable_parameters()
in PEFT by @kmehant in #1531 - fix: fail when required args not passed when
prompt_tuning_init==TEXT
by @kmehant in #1519 - Fixed minor grammatical and code bugs by @gremlin97 in #1542
- Optimize
levenshtein_distance
algorithm inpeft_lora_seq2seq_accelera…
by @SUNGOD3 in #1527 - Update
prompt_based_methods.md
by @insist93 in #1548 - FIX Allow AdaLoRA rank to be 0 by @BenjaminBossan in #1540
- FIX: Make adaptation prompt CI happy for transformers 4.39.0 by @younesbelkada in #1551
- MNT: Use
BitsAndBytesConfig
asload_in_*
is deprecated by @BenjaminBossan in #1552 - Add Support for Mistral Model in Llama-Adapter Method by @PrakharSaxena24 in #1433
- Add support for layer replication in LoRA by @siddartha-RE in #1368
- QDoRA: Support DoRA with BnB quantization by @BenjaminBossan in #1518
- Feat: add support for Conv2D DoRA by @sayakpaul in #1516
- TST Report slowest tests by @BenjaminBossan in #1556
- Changes to support fsdp+qlora and dsz3+qlora by @pacman100 in #1550
- Update style with ruff 0.2.2 by @BenjaminBossan in #1565
- FEAT Mixing different LoRA adapters in same batch by @BenjaminBossan in #1558
- FIX [
CI
] Fix test docker CI by @younesbelkada in #1535 - Fix LoftQ docs and tests by @BenjaminBossan in #1532
- More convenient way to initialize LoftQ by @BenjaminBossan in #1543
New Contributors
- @DopeorNope-Lee made their first contribution in #1372
- @kmehant made their first contribution in #1531
- @gremlin97 made their first contribution in #1542
- @SUNGOD3 made their first contribution in #1527
- @insist93 made their first contribution in #1548
- @PrakharSaxena24 made their first contribution in #1433
- @siddartha-RE made their first contribution in #1368
Full Changelog: v0.9.0...v0.10.0
v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more
Highlights
New methods for merging LoRA weights together
With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead, this is about merging the weights from different LoRA adapters into a single adapter by calling add_weighted_adapter
. This allows you to combine the strength from multiple LoRA adapters into a single adapter, while being faster than activating each of these adapters individually.
Although this feature has already existed in PEFT for some time, we have added new merging methods that promise much better results. The first is based on TIES, the second on DARE and a new one inspired by both called Magnitude Prune. If you haven't tried these new methods, or haven't touched the LoRA weight merging feature at all, you can find more information here:
AWQ and AQLM support for LoRA
Via #1394, we now support AutoAWQ in PEFT. This is a new method for 4bit quantization of model weights.
Similarly, we now support AQLM via #1476. This method allows to quantize weights to as low as 2 bits. Both methods support quantizing nn.Linear
layers. To find out more about all the quantization options that work with PEFT, check out our docs here.
Note these integrations do not support merge_and_unload()
yet, meaning for inference you need to always attach the adapter weights into the base model
DoRA support
We now support Weight-Decomposed Low-Rank Adaptation aka DoRA via #1474. This new method is builds on top of LoRA and has shown very promising results. Especially at lower ranks (e.g. r=8
), it should perform much better than LoRA. Right now, only non-quantized nn.Linear
layers are supported. If you'd like to give it a try, just pass use_dora=True
to your LoraConfig
and you're good to go.
Documentation
Thanks to @stevhliu and many other contributors, there have been big improvements to the documentation. You should find it more organized and more up-to-date. Our DeepSpeed and FSDP guides have also been much improved.
Check out our improved docs if you haven't already!
Development
If you're implementing custom adapter layers, for instance a custom LoraLayer
, note that all subclasses should now implement update_layer
-- unless they want to use the default method by the parent class. In particular, this means you should no longer use different method names for the subclass, like update_layer_embedding
. Also, we generally don't permit ranks (r
) of 0 anymore. For more, see this PR.
Developers should have an easier time now since we fully embrace ruff. If you're the type of person who forgets to call make style
before pushing to a PR, consider adding a pre-commit hook. Tests are now a bit less verbose by using plain asserts and generally embracing pytest features more fully. All of this comes thanks to @akx.
What's Changed
On top of these changes, we have added a lot of small changes since the last release, check out the full changes below. As always, we had a lot of support by many contributors, you're awesome!
- Release patch version 0.8.2 by @pacman100 in #1428
- [docs] Polytropon API by @stevhliu in #1422
- Fix
MatMul8bitLtBackward
view issue by @younesbelkada in #1425 - Fix typos by @szepeviktor in #1435
- Fixed saving for models that don't have _name_or_path in config by @kovalexal in #1440
- [docs] README update by @stevhliu in #1411
- [docs] Doc maintenance by @stevhliu in #1394
- [
core
/TPLinear
] Fix breaking change by @younesbelkada in #1439 - Renovate quality tools by @akx in #1421
- [Docs] call
set_adapters()
after add_weighted_adapter by @sayakpaul in #1444 - MNT: Check only selected directories with ruff by @BenjaminBossan in #1446
- TST: Improve test coverage by skipping fewer tests by @BenjaminBossan in #1445
- Update Dockerfile to reflect how to compile bnb from source by @younesbelkada in #1437
- [docs] Lora-like guides by @stevhliu in #1371
- [docs] IA3 by @stevhliu in #1373
- Add docstrings for set_adapter and keep frozen by @EricLBuehler in #1447
- Add new merging methods by @pacman100 in #1364
- FIX Loading with AutoPeftModel.from_pretrained by @BenjaminBossan in #1449
- Support
modules_to_save
config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. by @pacman100 in #1450 - FIX Honor HF_HUB_OFFLINE mode if set by user by @BenjaminBossan in #1454
- [docs] Remove iframe by @stevhliu in #1456
- [docs] Docstring typo by @stevhliu in #1455
- [
core
/get_peft_state_dict
] Ignore all exceptions to avoid unexpected errors by @younesbelkada in #1458 - [
Adaptation Prompt
] Fix llama rotary embedding issue with transformers main by @younesbelkada in #1459 - [
CI
] Add CI tests on transformers main to catch early bugs by @younesbelkada in #1461 - Use plain asserts in tests by @akx in #1448
- Add default IA3 target modules for Mixtral by @arnavgarg1 in #1376
- add
magnitude_prune
merging method by @pacman100 in #1466 - [docs] Model merging by @stevhliu in #1423
- Adds an example notebook for showing multi-adapter weighted inference by @sayakpaul in #1471
- Make tests succeed more on MPS by @akx in #1463
- [
CI
] Fix adaptation prompt CI on transformers main by @younesbelkada in #1465 - Update docstring at peft_types.py by @eduardozamudio in #1475
- FEAT: add awq suppot in PEFT by @younesbelkada in #1399
- Add pre-commit configuration by @akx in #1467
- ENH [
CI
] Run tests only when relevant files are modified by @younesbelkada in #1482 - FIX [
CI
/bnb
] Fix failing bnb workflow by @younesbelkada in #1480 - FIX [
PromptTuning
] Simple fix for transformers >= 4.38 by @younesbelkada in #1484 - FIX: Multitask prompt tuning with other tuning init by @BenjaminBossan in #1144
- previous_dtype is now inferred from F.linear's result output type. by @MFajcik in #1010
- ENH: [
CI
/Docker
]: Create a workflow to temporarly build docker images in case dockerfiles are modified by @younesbelkada in #1481 - Fix issue with unloading double wrapped modules by @BenjaminBossan in #1490
- FIX: [
CI
/Adaptation Prompt
] Fix CI on transformers main by @younesbelkada in #1493 - Update peft_bnb_whisper_large_v2_training.ipynb: Fix a typo by @martin0258 in #1494
- covert SVDLinear dtype by @PHOSPHENES8 in #1495
- Raise error on wrong type for to modules_to_save by @BenjaminBossan in #1496
- AQLM support for LoRA by @BlackSamorez in #1476
- Allow trust_remote_code for tokenizers when loading AutoPeftModels by @OfficialDelta in https://...
Release v0.8.2
What's Changed
- Release v0.8.2.dev0 by @pacman100 in #1416
- Add IA3 Modules for Phi by @arnavgarg1 in #1407
- Update custom_models.md by @boyufan in #1409
- Add positional args to PeftModelForCausalLM.generate by @SumanthRH in #1393
- [Hub] fix: subfolder existence check by @sayakpaul in #1417
- FIX: Make merging of adapter weights idempotent by @BenjaminBossan in #1355
- [
core
] fix critical bug in diffusers by @younesbelkada in #1427
New Contributors
Full Changelog: v0.8.1...v0.8.2
Patch Release v0.8.1
This is a small patch release of PEFT that should:
- Fix breaking change related to support for saving resized embedding layers and Diffusers models. Contributed by @younesbelkada in #1414
What's Changed
- Release 0.8.1.dev0 by @pacman100 in #1412
- Fix breaking change by @younesbelkada in #1414
- Patch Release v0.8.1 by @pacman100 in #1415
Full Changelog: v0.8.0...v0.8.1
v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more
Highlights
Poly PEFT method
Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.
- Add Poly by @TaoSunVoyage in #1129
LoRA improvements
Now, you can specify all-linear
to target_modules
param of LoraConfig
to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers
- Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:
- Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
- Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
- Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.
- save the embeddings even when they aren't targetted but resized by @pacman100 in #1383
New option use_rslora
in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).
- Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
Documentation improvements
- Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
- Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
- DOC: Update docstring for the config classes by @BenjaminBossan in #1343
- LoftQ: edit README.md and example files by @yxli2123 in #1276
- [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
- DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
- [docs] Docstring link by @stevhliu in #1356
- QOL improvements and doc updates by @pacman100 in #1318
- Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
- DOC: Improve target modules description by @BenjaminBossan in #1290
- DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
- DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
- Improve documentation for the
all-linear
flag by @SumanthRH in #1357 - Fix various typos in LoftQ docs. by @arnavgarg1 in #1408
What's Changed
- Bump version to 0.7.2.dev0 post release by @BenjaminBossan in #1258
- FIX Error in log_reports.py by @BenjaminBossan in #1261
- Fix ModulesToSaveWrapper getattr by @zhangsheng377 in #1238
- TST: Revert device_map for AdaLora 4bit GPU test by @BenjaminBossan in #1266
- remove a duplicated description in peft BaseTuner by @butyuhao in #1271
- Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
- feat: add apple silicon GPU acceleration by @NripeshN in #1217
- LoftQ: Allow quantizing models loaded on the CPU for LoftQ initialization by @hiyouga in #1256
- LoftQ: edit README.md and example files by @yxli2123 in #1276
- TST: Extend LoftQ tests to check CPU initialization by @BenjaminBossan in #1274
- Refactor and a couple of fixes for adapter layer updates by @BenjaminBossan in #1268
- [
Tests
] Add bitsandbytes installed from source on new docker images by @younesbelkada in #1275 - TST: Enable LoftQ 8bit tests by @BenjaminBossan in #1279
- [
bnb
] Add bnb nightly workflow by @younesbelkada in #1282 - Fixed several errors in StableDiffusion adapter conversion script by @kovalexal in #1281
- [docs] Concept guides by @stevhliu in #1269
- DOC: Improve target modules description by @BenjaminBossan in #1290
- [
bnb-nightly
] Address final comments by @younesbelkada in #1287 - [BNB] Fix bnb dockerfile for latest version by @SunMarc in #1291
- fix fsdp auto wrap policy by @pacman100 in #1302
- [BNB] fix dockerfile for single gpu by @SunMarc in #1305
- Fix bnb lora layers not setting active adapter by @tdrussell in #1294
- Mistral IA3 config defaults by @pacman100 in #1316
- fix the embedding saving for adaption prompt by @pacman100 in #1314
- fix diffusers tests by @pacman100 in #1317
- FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x by @BenjaminBossan in #1320
- Extend merge_and_unload to offloaded models by @blbadger in #1190
- Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
- Refactor dispatching logic of LoRA layers by @BenjaminBossan in #1319
- Fix bug when load the prompt tuning in inference. by @yileld in #1333
- DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
- ENH: Add attribute to show targeted module names by @BenjaminBossan in #1330
- fix some args desc by @zspo in #1338
- Fix logic in target module finding by @s-k-yx in #1263
- Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
- DOC: Update docstring for the config classes by @BenjaminBossan in #1343
- fix
prepare_inputs_for_generation
logic for Prompt Learning methods by @pacman100 in #1352 - QOL improvements and doc updates by @pacman100 in #1318
- New transformers caching ETA now v4.38 by @BenjaminBossan in #1348
- FIX Setting active adapter for quantized layers by @BenjaminBossan in #1347
- DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
- [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
- DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
- Add Poly by @TaoSunVoyage in #1129
- [docs] Docstring link by @stevhliu in #1356
- Added missing getattr dunder methods for mixed model by @kovalexal in #1365
- Handle resizing of embedding layers for AutoPeftModel by @pacman100 in #1367
- account for the new merged/unmerged weight to perform the quantization again by @pacman100 in #1370
- add mixtral in LoRA mapping by @younesbelkada in https://github.com/h...
v0.7.1 patch release
This is a small patch release of PEFT that should handle:
- Issues with loading multiple adapters when using quantized models (#1243)
- Issues with transformers v4.36 and some prompt learning methods (#1252)
What's Changed
- [docs] OFT by @stevhliu in #1221
- Bump version to 0.7.1.dev0 post release by @BenjaminBossan in #1227
- Don't set config attribute on custom models by @BenjaminBossan in #1200
- TST: Run regression test in nightly test runner by @BenjaminBossan in #1233
- Lazy import of bitsandbytes by @BenjaminBossan in #1230
- FIX: Pin bitsandbytes to <0.41.3 temporarily by @BenjaminBossan in #1234
- [docs] PeftConfig and PeftModel by @stevhliu in #1211
- TST: Add tolerance for regression tests by @BenjaminBossan in #1241
- Bnb integration test tweaks by @Titus-von-Koeller in #1242
- [docs] PEFT integrations by @stevhliu in #1224
- Revert "FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)" by @Titus-von-Koeller in #1250
- Fix model argument issue (#1198) by @ngocbh in #1205
- TST: Add tests for 4bit LoftQ by @BenjaminBossan in #1208
- [docs] Quantization by @stevhliu in #1236
- FIX: Truncate slack message to not exceed 3000 chars by @BenjaminBossan in #1251
- Issue with transformers 4.36 by @BenjaminBossan in #1252
- Fix: Multiple adapters with bnb layers by @BenjaminBossan in #1243
- Release: 0.7.1 by @BenjaminBossan in #1257
New Contributors
- @Titus-von-Koeller made their first contribution in #1242
- @ngocbh made their first contribution in #1205
Full Changelog: v0.7.0...v0.7.1
v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more
Highlights
- Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
- Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
- LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.
Other notable additions
- It is now possible to choose which adapters are merged when calling
merge
(#1132) - IA³ now supports adapter deletion, by @alexrs (#1153)
- A new initialization method for LoRA has been added,
"gaussian"
(#1189) - When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
- It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
- We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.
Migration to v0.7.0
- Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file
adapter_model.bin
, callingsave_pretrained
now createsadapter_model.safetensors
. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub. - When merging multiple LoRA adapter weights together using
add_weighted_adapter
with the optioncombination_type="linear"
, the scaling of the adapter weights is now performed differently, leading to improved results. - There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g.
peft.lora.Linear
is no longer a subclass ofnn.Linear
, soisinstance
checks may need updating). Also, to retrieve the original weight of an adapted layer, now useself.get_base_layer().weight
, notself.weight
(same forbias
).
What's Changed
As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.
- After release: Bump version to 0.7.0.dev0 by @BenjaminBossan in #1074
- FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
- FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084
- Improve documentation for IA³ by @SumanthRH in #984
- [
Docker
] Update Dockerfile to force-use transformers main by @younesbelkada in #1085 - Update the release checklist by @BenjaminBossan in #1075
- fix-gptq-training by @SunMarc in #1086
- fix the failing CI tests by @pacman100 in #1094
- Fix f-string in import_utils by @KCFindstr in #1091
- Fix IA3 config for Falcon models by @SumanthRH in #1007
- FIX: Failing nightly CI tests due to IA3 config by @BenjaminBossan in #1100
- [
core
] Fix safetensors serialization for shared tensors by @younesbelkada in #1101 - Change to 0.6.1.dev0 by @younesbelkada in #1102
- Release: 0.6.1 by @younesbelkada in #1103
- set dev version by @younesbelkada in #1104
- avoid unnecessary import by @winglian in #1109
- Refactor adapter deletion by @BenjaminBossan in #1105
- Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
- Fix import issue transformers with
id_tensor_storage
by @younesbelkada in #1116 - Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112 - fix doc typo by @coding-famer in #1121
- Release: v0.6.2 by @pacman100 in #1125
- Release: v0.6.3.dev0 by @pacman100 in #1128
- FIX: Adding 2 adapters when target_modules is a str fails by @BenjaminBossan in #1111
- Prompt tuning: Allow to pass additional args to AutoTokenizer.from_pretrained by @BenjaminBossan in #1053
- Fix: TorchTracemalloc ruins Windows performance by @lukaskuhn-lku in #1126
- TST: Improve requires grad testing: by @BenjaminBossan in #1131
- FEAT: Make safe serialization the default one by @younesbelkada in #1088
- FEAT: Merging only specified
adapter_names
when callingmerge
by @younesbelkada in #1132 - Refactor base layer pattern by @BenjaminBossan in #1106
- [
Tests
] Fix daily CI by @younesbelkada in #1136 - [
core
/LoRA
] Addadapter_names
in bnb layers by @younesbelkada in #1139 - [
Tests
] Do not stop tests if a job failed by @younesbelkada in #1141 - CI Add Python 3.11 to test matrix by @BenjaminBossan in #1143
- FIX: A few issues with AdaLora, extending GPU tests by @BenjaminBossan in #1146
- Use
huggingface_hub.file_exists
instead of custom helper by @Wauplin in #1145 - Delete IA3 adapter by @alexrs in #1153
- [Docs fix] Relative path issue by @mishig25 in #1157
- Dataset was loaded twice in 4-bit finetuning script by @lukaskuhn-lku in #1164
- fix
add_weighted_adapter
method by @pacman100 in #1169 - (minor) correct type annotation by @vwxyzjn in #1166
- Update release checklist about release notes by @BenjaminBossan in #1170
- [docs] Migrate doc files to Markdown by @stevhliu in #1171
- Fix dockerfile build by @younesbelkada in #1177
- FIX: Wrong use of base layer by @BenjaminBossan in #1183
- [
Tests
] Migrate to AWS runners by @younesbelkada in #1185 - Fix code example in quicktour.md by @merveenoyan in #1181
- DOC Update a few places in the README by @BenjaminBossan in #1152
- Fix issue where you cannot call PeftModel.from_pretrained with a private adapter by @elyxlz in #1076
- Added lora support for phi by @umarbutler in #1186
- add options to save or push model by @callanwu in #1159
- ENH: Different initialization methods for LoRA by @BenjaminBossan in #1189
- Training PEFT models with new tokens being added to the embedding layers and tokenizer by @pacman100 in #1147
- LoftQ: Add LoftQ method integrated into LoRA. Add example code for LoftQ usage. by @yxli2123 in #1150
- Parallel linear Lora by @zhangsheng377 in #1092
- [Feature] Support OFT by @okotaku in #1160
- Mixed adapter models by @BenjaminBossan in #1163
- [DOCS] README.md by @Akash190104 in #1054
- Fix parallel linear lora by @zhangsheng377 in #1202
- ENH: Enable OFT adapter for mixed adapter models by @BenjaminBossan in #1204
- DOC: Update & improve docstrings and type annotations for common methods and classes by @BenjaminBossan in https://g...
v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API
This patch release refactors the adapter deletion API and fixes to ModulesToSaveWrapper
when using Low-level API.
Refactor adapter deletion
- Refactor adapter deletion by @BenjaminBossan in #1105
Fix ModulesToSaveWrapper
when using Low-level API
- Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112
What's Changed
What's Changed
- Release: 0.6.1 by @younesbelkada in #1103
- set dev version by @younesbelkada in #1104
- avoid unnecessary import by @winglian in #1109
- Refactor adapter deletion by @BenjaminBossan in #1105
- Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
- Fix import issue transformers with
id_tensor_storage
by @younesbelkada in #1116 - Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112 - fix doc typo by @coding-famer in #1121
New Contributors
- @winglian made their first contribution in #1109
- @lukaskuhn-lku made their first contribution in #1107
- @coding-famer made their first contribution in #1121
Full Changelog: v0.6.1...v0.6.2