Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype #2764

yaswanthchittepu · 2024-05-10T20:07:34Z

Hi,

I am using accelerate FSDP to align a Pythia-2.8B LLM, using DPOTrainer in the TRL library. The code works well in a multi-gpu setting (the config in the custom-configuration section here). For context, I am using 4 A100s in a single node on a compute cluster. But when i try to use FSDP, with the same configuration provided in the huggingface examples, and execute accelerate launch --config_file=fsdp.yaml --num_processes=4 dpo.py, I run into the following error RuntimeError: Tensors of the same index must be on the same device and the same dtype except step tensors that can be CPU and float32/64 notwithstanding

I have looked around a bit and came across this question on the pytorch discussion form. The suggested workarounds from the thread are:

use foreach=False with optimizer (you’d be missing out on performance here)
avoid torch.set_default_dtype(torch.float64)

The first involves passing an optimizer, which is not allowed when using FSDP. What could be causing this and how do I resolve this?

Also, some additional package information. I am using Python version 3.10.14, trl 0.8.6, transformers 4.40.1, torch 2.3.0, peft 0.10.0, cuda 12.1, and accelerate 0.29.3. I have also provided my script here for the sake of completeness. Additionally, I have observed this error even when using the SFTTrainer and RewardTrainer in TRL.

`

train_data = get_tldr(datapath, 'train')

test_data = get_tldr(datapath, 'test')

train_data, test_data = map(to_hf_format, [train_data, test_data])

######################### ARGS ###########################
model_name = "EleutherAI/pythia-2.8b"
sft_ckpt_name = "EleutherAI/pythia-2.8b"
batch_size = 16
gradient_accumulation_steps = 16
gradient_checkpoint = True
##########################################################

torch.distributed.init_process_group(backend="nccl", timeout=datetime.timedelta(seconds=36000))

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(sft_ckpt_name, \
            device_map={"": Accelerator().local_process_index},
        )

if(gradient_checkpoint):
    # Needed when we use gradient checkpointing
    # Otherwise complains that no gradients to compute 
    if hasattr(model, "enable_input_require_grads"):
        model.enable_input_require_grads()
    else:
        def make_inputs_require_grad(module, input, output):
            output.requires_grad_(True)
        model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, peft_config, adapter_name='__train__')

training_args = TrainingArguments(
    output_dir=<out_dir>,
    report_to='none',
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=8,
    bf16=True, # Need A100s
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_ratio = 0.1,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=gradient_checkpoint,
    gradient_checkpointing_kwargs={"use_reentrant":False},
    evaluation_strategy="steps",
    eval_steps=10,
    num_train_epochs=4,
    # logging strategies 
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="steps",
    save_steps = 20,
    save_total_limit = 10,
    load_best_model_at_end = True,
    max_grad_norm=1.,
    remove_unused_columns=False,
)

# Initialize the trainer, without a ref_model param.
dpo_trainer = DPOTrainer(
    model,
    ref_model=None,
    beta=0.1,
    train_dataset=train_data,
    eval_dataset=test_data,
    tokenizer=tokenizer,
    max_length=512,
    max_prompt_length=256,
    args=training_args,
)

dpo_trainer.train()`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype #2764

Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype #2764

yaswanthchittepu commented May 10, 2024 •

edited

Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype #2764

Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype #2764

Comments

yaswanthchittepu commented May 10, 2024 • edited

yaswanthchittepu commented May 10, 2024 •

edited