all_gather with gloo backend does not work in inference mode #126032
Labels
module: c10d
Issues/PRs related to collective communications and process groups
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
A minimal reproducible example:
The error is:
It looks strange, that
nccl
backend works in this case.broadcast
works, too. Onlyall_gather
does not work.Versions
pytorch 2.3.0
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k
The text was updated successfully, but these errors were encountered: