Duplicate elements in `split_between_processes` #2750

hkunzhe · 2024-05-07T09:19:46Z

System Info

accelerate==git+https://github.com/huggingface/accelerate.git

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Test example test_accelerate.py

from accelerate import PartialState  # Can also be Accelerator or AcceleratorState

state = PartialState()
input_list = list(range(17))

with state.split_between_processes(input_list) as splitted_input_list:
    print(f"{state.device}, {splitted_input_list}")

Run accelerate launch --num_processes 8 test_accelerate.py
Output

cuda:2, [6, 7, 8]
cuda:7, [16]
cuda:1, [3, 4, 5]
cuda:4, [12, 13, 14]
cuda:3, [9, 10, 11]
cuda:6, [16]
cuda:0, [0, 1, 2]
cuda:5, [15, 16]

As we can see, there are three 16 among three different processes. It is caused by

accelerate/src/accelerate/state.py

Lines 440 to 442 in 7ac153f

    
           num_samples_per_process = math.ceil(length / self.num_processes) 
        
           start_index = self.process_index * num_samples_per_process 
        
           end_index = start_index + num_samples_per_process

We should modify above codes as following to get the expected output

num_samples_per_process = length // self.num_processes
num_extras = length % self.num_processes
start_index = self.process_index * num_samples_per_process + min(self.process_index, num_extras)
end_index = start_index + num_samples_per_process + (1 if self.process_index < num_extras else 0)

Expected behavior

Expected output:

cuda:2, [5, 6]
cuda:7, [15, 16]
cuda:1, [3, 4]
cuda:4, [9, 10]
cuda:3, [7, 8]
cuda:6, [13, 14]
cuda:0, [0, 1, 2]
cuda:5, [11, 12]

The text was updated successfully, but these errors were encountered:

hkunzhe · 2024-05-09T02:30:03Z

@muellerzr , Could you check it out?

hammoudhasan · 2024-05-14T09:42:45Z

I also observed the same thing on my end! I would split a list of 100 elements on 8 gpus and get a total of 113 elements or something. The fix proposed by @hkunzhe worked for me.

muellerzr · 2024-05-14T12:45:45Z

Thanks for the flag! Would you like to make a PR with your fix? :)

hkunzhe · 2024-05-15T02:53:48Z

Thanks for the flag! Would you like to make a PR with your fix? :)

Sure!

hkunzhe mentioned this issue May 15, 2024

fix duplicate elements in split_between_processes #2781

Merged

5 tasks

muellerzr closed this as completed in #2781 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate elements in `split_between_processes` #2750

Duplicate elements in `split_between_processes` #2750

hkunzhe commented May 7, 2024 •

edited

hkunzhe commented May 9, 2024

hammoudhasan commented May 14, 2024 •

edited

muellerzr commented May 14, 2024

hkunzhe commented May 15, 2024

Duplicate elements in split_between_processes #2750

Duplicate elements in split_between_processes #2750

Comments

hkunzhe commented May 7, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

hkunzhe commented May 9, 2024

hammoudhasan commented May 14, 2024 • edited

muellerzr commented May 14, 2024

hkunzhe commented May 15, 2024

Duplicate elements in `split_between_processes` #2750

Duplicate elements in `split_between_processes` #2750

hkunzhe commented May 7, 2024 •

edited

hammoudhasan commented May 14, 2024 •

edited