You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead. #2726

Moazzamnamal · 2024-04-30T11:14:12Z

System Info

`Accelerate` version: 0.29.3
- Platform: Windows-10-10.0.19045-SP0
- `accelerate` bash location: c:\Users\Sardar Moazzam\AppData\Roaming\Python\Python310\Scripts\accelerate.exe
- Python version: 3.10.4
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0+cpu (False)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 15.89 GB
- `Accelerate` default config:
	Not found

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

import transformers
from transformers import AutoModelForCausalLM
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, low_cpu_mem_usage = True).cpu()
from accelerate import disk_offload
disk_offload(model=model, offload_dir="alpha")
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)

messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)

terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Expected behavior

I am running the following code and it gives the following error:
""You are trying to offload the whole model to the disk. Please use the disk_offload function instead.""

for solving the error i added the following code (so the above given code is after the addition to solve the previous issue):
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, low_cpu_mem_usage = True).cpu()
from accelerate import disk_offload
disk_offload(model=model, offload_dir="alpha")

but when i add this code in the my original one and try to run it that after some time of running then the jupyter kernel crashes. can you please also tell me that what is this ""offload_dir="alpha"". Can you please explain it to me

The text was updated successfully, but these errors were encountered:

muellerzr · 2024-04-30T12:11:55Z

cc @SunMarc

SunMarc · 2024-04-30T12:23:12Z

Hi @Moazzamnamal, you are trying to load the model twice when calling pipeline and AutoModelForCausalLM.from_pretrained. Just do :

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)

and it should load your model on cpu and disk since you don't have a GPU.

Moazzamnamal · 2024-04-30T12:27:53Z

@SunMarc thanks let me try it

github-actions · 2024-05-30T15:06:23Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead. #2726

You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead. #2726

Moazzamnamal commented Apr 30, 2024 •

edited

muellerzr commented Apr 30, 2024

SunMarc commented Apr 30, 2024

Moazzamnamal commented Apr 30, 2024

github-actions bot commented May 30, 2024

You are trying to offload the whole model to the disk. Please use the disk_offload function instead. #2726

You are trying to offload the whole model to the disk. Please use the disk_offload function instead. #2726

Comments

Moazzamnamal commented Apr 30, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

muellerzr commented Apr 30, 2024

SunMarc commented Apr 30, 2024

Moazzamnamal commented Apr 30, 2024

github-actions bot commented May 30, 2024

You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead. #2726

You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead. #2726

Moazzamnamal commented Apr 30, 2024 •

edited