-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[awq] replace scale when we have GELU #30074
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Two comments:
- I'm worried that the current implementation will results in silent errors, as the logging is at the level of info. Hence, if some different module names than
["fc_in", "dense_h_to_4h", "up_proj", "c_fc"]
are used, this issue will not be solved and we'll rely on another user flagging this. - Tests should be added to make sure the replacement happens as expected
src/transformers/integrations/awq.py
Outdated
if isinstance(module, nn.GELU) and name not in modules_to_not_convert: | ||
# get the module just before applying act | ||
# the easiest way is just checking if the mlp have these layers. | ||
layers = ["fc_in", "dense_h_to_4h", "up_proj", "c_fc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems super brittle - let's do this in a more generalized way. In particular, this doesn't check these layers are used before applying the activation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good ! I decided to only modify impacted model in the end. While this is a bit rigid, I think it is fine for awq since the number of models type that are quantized is pretty small.
src/transformers/integrations/awq.py
Outdated
@@ -145,6 +147,17 @@ def replace_with_awq_linear( | |||
|
|||
# Force requires grad to False to avoid unexpected errors | |||
model._modules[name].requires_grad_(False) | |||
if isinstance(module, nn.GELU) and name not in modules_to_not_convert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #30225 (comment).
May need to adapt to variants of GELU
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lastest commit should include all affected models to this date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
The only comment I'd make is to have a clearer name that replace_scales
. scales
isn't very descriptive - just from the method name, I don't really know what's happening, nor what a "scale" is
What does this PR do ?
This PR replaces the scales by
ScaleActivation
when we have GELU activation. Before this PR, the scales calculated during the quantization were not loaded properly. To have a quick fix that doesn't depend on each model, I just check if we have the following layerslayers = ["fc_in","dense_h_to_4h","up_proj","c_fc"]
in the module that contains the GELU. This is done in order to get the shape of the scales. We need that to be able to load the scales, otherwise we will get an error saying that the shapes don't match.Fixes #29421
Fixes #30225