Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug with split_between_processes #2736

Open
1 of 4 tasks
Vincent-Li-9701 opened this issue May 2, 2024 · 2 comments
Open
1 of 4 tasks

Potential Bug with split_between_processes #2736

Vincent-Li-9701 opened this issue May 2, 2024 · 2 comments

Comments

@Vincent-Li-9701
Copy link

Vincent-Li-9701 commented May 2, 2024

System Info

accelerate==0.29.3

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

The main issue comes from here. If inputs originally is not on the device but on cpu, there will be a Runtime Error.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu!

def grade(inputs):
    distributed_state = Accelerator()
    model_path = "mistralai/Mistral-7B-Instruct-v0.2"

    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, padding_side="left")
    tokenizer.pad_token = tokenizer.unk_token
    model = AutoModelForCausalLM.from_pretrained(model_path).eval().to(distributed_state.device)

    prompts = tokenizer(
        list(inputs['prompt']), 
        add_special_tokens=True,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=756,
    ).data 
    generation_config = GenerationConfig(
        max_new_tokens=5,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
    txt_outs = []
    batch_size = 10
    with distributed_state.split_between_processes(prompts, apply_padding=True) as inputs:        
        for i in range(0, len(inputs["input_ids"]), batch_size):
            with torch.no_grad():
                res = model.generate(
                    input_ids=inputs["input_ids"][i: i + batch_size].to(distributed_state.device), 
                    attention_mask=inputs["attention_mask"][i: i + batch_size].to(distributed_state.device),
                    generation_config=generation_config
                )
            txt_out = tokenizer.batch_decode(res, skip_special_tokens=True, clean_up_tokenization_spaces=True)
            txt_out_across_devices = [None for _ in range(distributed_state.num_processes)]

            dist.gather_object(
                 txt_out,
                 txt_out_across_devices if distributed_state.is_main_process else None,
                    dst=0
            )
            if distributed_state.is_main_process:
                txt_outs.extend(txt_out_across_devices)
test_samples = pd.DataFrame({"prompt": ["Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?"]})
notebook_launcher(grade, args=[test_samples], num_processes=8)

Expected behavior

No error and padding applied successfully.

@bohao-cao
Copy link

Did you figure out the solutioin?

@Vincent-Li-9701
Copy link
Author

Vincent-Li-9701 commented May 23, 2024

Did you figure out the solution?

There is no good solution on this other than changing the source code. What I'm doing now is pre-padding the samples myself. It's also needed when you are running multi-process/node inferencing. Do the padding before running the split between process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants