[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670
Labels
bug
Something isn't working
Describe the bug
I am using the
quant_with_alpaca.py
script to quantize MaziyarPanahi/Llama-3-70B-Instruct-32k-v0.1. I am using the following command:I have tried running the above without
--save_and_reload
and the script quantizes the model and then runs inference which seems fine. But the model never gets saved anywhere. With the--save_and_reload
switch, I get the this output:INFO - Model packed. 2024-05-13 03:45:45 INFO [auto_gptq.modeling._utils] Model packed. WARNING - using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model. 2024-05-13 03:45:45 WARNING [auto_gptq.modeling._utils] using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model. 2024-05-13 03:45:45 WARNING [accelerate.big_modeling] You shouldn't move a model that is dispatched using accelerate hooks.
After this it crashes because of a CUDA OOM error.
When run without the
--save_and_reload
switch, the script tests the quant with 4 instructions and then exits without any error (although the inference speed was painfully slow).Hardware details
I have an EPYC 7532 processor with 256GB ram and 4x 3090s.
Software version
Ubuntu 22.04.4 LTS (6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
Python 3.10.14
auto_gptq Version: 0.8.0.dev0+cu121
Torch 2.3.0+cu121
Transformers 4.40.2
Accelerate 0.30.1
To Reproduce
I made one change to one of the files to add the damp 0.1 argument for quantization.
Expected behavior
I was hoping to get a GPTQ quant of the above model.
The text was updated successfully, but these errors were encountered: