FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

younesbelkada · 2024-05-14T15:12:21Z

What does this PR do?

This PR adds a new feature dequantize in order to de-quantize models for interesting usecases such as the one described in #30177

The API is very simple:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))

Users just need to make sure they have enough GPU RAM in order to store the unquantized model, otherwise they might face unexpected behaviour

Added the support for 4-bit / 8-bit models and nice tests + docs to educate users on how to use this new API.

cc @amyeroberts @SunMarc

HuggingFaceDocBuilderDev · 2024-05-14T15:37:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks for adding this new method in quantizer ! This will make fine-tuning with quantized model way easier ! I left a few minor comments.

docs/source/en/quantization.md

src/transformers/integrations/__init__.py

src/transformers/modeling_utils.py

src/transformers/quantizers/quantizer_bnb_4bit.py

src/transformers/quantizers/quantizer_bnb_8bit.py

SunMarc · 2024-05-15T10:06:45Z

src/transformers/integrations/bitsandbytes.py

+    if cls_name == "Params4bit":
+        return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)


The user might want to know in which precision the model was dequantized since they don't have the possibility to control that. I think it could be great to give that information since there is no default value (as opposed to from_pretrained which loads the model in fp32).
Two ways to get that:

just check the dtype of the weights at the end ( potentially the easiest way )

check what happens in dequantize_4bit . In the method, you see that they get the output dtype with weight.quant_state.dtype.

We can potentially add a torch_dtype attribute in the future if it makes sense.

Nice catch! The output dtype should be correctly inferred here: https://github.com/TimDettmers/bitsandbytes/blob/b891f80ba514833f41f0e9226983b02a9fb5c44b/bitsandbytes/functional.py#L1349 through the compute_dtype so it should be accurate - I added a warning_once staement to inform users on the dequantized dtype: 1a4a906

amyeroberts

Thanks for adding this! +1 on all of @SunMarc's comments.

tests/quantization/bnb/test_mixed_int8.py

amyeroberts · 2024-05-15T10:46:29Z

src/transformers/integrations/bitsandbytes.py

+    if cls_name == "Params4bit":
+        return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)


amyeroberts · 2024-05-15T10:48:03Z

src/transformers/integrations/bitsandbytes.py

+
+    Returns the converted model and a boolean that indicates if the conversion has been successfull or not.
+    """
+    import bitsandbytes as bnb


This is already imported at the top of the module

Nice catch ! Should be fixed now

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

amyeroberts

Thanks for adding this feature and iterating!

amyeroberts · 2024-05-15T14:05:32Z

src/transformers/integrations/bitsandbytes.py

+            )
+        # Remove the last key for recursion
+        current_key_name.pop(-1)
+    return model, has_been_replaced


One general comment, if instead you could have a private method _dequantize_and_replace, which handles the recursion, you don't need to return has_been_replaced here. When someone calls dequantize_and_replace, I don't think has_been_replaced is ever used and could be confusing e.g.:

# This is just dequantize_and_replace from before def _dequantize_and_replace( model, modules_to_not_convert=None, current_key_name=None, quantization_config=None, has_been_replaced=False, ): ... return model, has_been_replaced def dequantize_and_replace( model, modules_to_not_convert=None, current_key_name=None, quantization_config=None, has_been_replaced=False, ): model, has_been_replaced = _dequantize_and_replace(...) return model

makes sense ! Will do !

Done in 8b904f7 !

src/transformers/integrations/bitsandbytes.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts · 2024-05-15T14:43:14Z

src/transformers/integrations/bitsandbytes.py

+    )
+
+    if not has_been_replaced:
+        logger.warning(


RonanKMcGovern · 2024-05-15T16:57:05Z

Yeah this is great, thanks

younesbelkada · 2024-05-16T09:58:09Z

Great thanks @RonanKMcGovern ! let us know how it goes

younesbelkada added 2 commits May 14, 2024 17:07

add method

e0c39a9

change method name

c748425

younesbelkada mentioned this pull request May 14, 2024

Load nf4 weights/model in bfloat16 #30177

Closed

younesbelkada requested review from SunMarc and amyeroberts May 14, 2024 15:13

more comments

14d51c2

SunMarc approved these changes May 15, 2024

View reviewed changes

amyeroberts reviewed May 15, 2024

View reviewed changes

younesbelkada and others added 5 commits May 15, 2024 13:02

Apply suggestions from code review

Loading
Loading status checks…

be7af7c

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fixup

Loading
Loading status checks…

1cff84d

add docstrings and fix comment

Loading
Loading status checks…

ba01b82

warn users on the de-quantized dtype

1a4a906

Update src/transformers/quantizers/base.py

309581b

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

younesbelkada requested a review from amyeroberts May 15, 2024 13:19

amyeroberts approved these changes May 15, 2024

View reviewed changes

younesbelkada and others added 2 commits May 15, 2024 16:22

Update src/transformers/integrations/bitsandbytes.py

2813fb8

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

final suggestion - use private method

8b904f7

amyeroberts reviewed May 15, 2024

View reviewed changes

src/transformers/integrations/bitsandbytes.py

)

if not has_been_replaced:

logger.warning(

Copy link

Collaborator

amyeroberts May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Merge remote-tracking branch 'origin/main' into add-dequant

7f17efa

younesbelkada merged commit 3f43582 into huggingface:main May 15, 2024
22 checks passed

younesbelkada deleted the add-dequant branch May 15, 2024 15:17

faaany mentioned this pull request May 24, 2024

fix models that fail in test_model_parallelism #30876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

younesbelkada commented May 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 14, 2024

SunMarc left a comment

SunMarc May 15, 2024 •

edited

Loading

amyeroberts May 15, 2024

younesbelkada May 15, 2024

amyeroberts left a comment

amyeroberts May 15, 2024

amyeroberts May 15, 2024

younesbelkada May 15, 2024

amyeroberts left a comment

amyeroberts May 15, 2024 •

edited

Loading

younesbelkada May 15, 2024

younesbelkada May 15, 2024

amyeroberts May 15, 2024

RonanKMcGovern commented May 15, 2024

younesbelkada commented May 16, 2024

		if cls_name == "Params4bit":
		return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)

FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models #30806

Conversation

younesbelkada commented May 14, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented May 14, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc May 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts May 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RonanKMcGovern commented May 15, 2024

younesbelkada commented May 16, 2024

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

younesbelkada commented May 14, 2024 •

edited

Loading

SunMarc May 15, 2024 •

edited

Loading

amyeroberts May 15, 2024 •

edited

Loading