[Core][Distributed] remove graph mode function #4818

youkaichao · 2024-05-14T22:23:23Z

Users only need to use with graph_capture() to manage the context when they capture the graph, before the graph can be replayed.

Inside the capture, we need to turn on graph mode. Outside the capture, there is no need to call graph mode.

Therefore, these two functions can be merged into one.

vllm/distributed/communication_op.py

WoosukKwon · 2024-05-14T23:47:16Z

tests/distributed/test_pynccl.py

@@ -103,7 +103,7 @@ def multiple_tp_with_vllm_worker_fn():
    device = torch.device(f"cuda:{torch.distributed.get_rank()}")
    ensure_model_parallel_initialized(2, 2)
    tensor = torch.ones(16, 1024, 1024, dtype=torch.float32, device=device)
-    with graph_mode():
+    with graph_capture():


Here, how do we make sure it's not using custom all reduce?

Actually, even before this PR, we cannot make sure it's not using custom all reduce. It is true in CI because our CI does not have custom allreduce.

To solve this problem, another refactor is needed. We need to expose a new function to create tp groups with different communicators. That's my next PR to come!

vllm/worker/model_runner.py

WoosukKwon · 2024-05-16T16:32:45Z

vllm/distributed/communication_op.py

+@dataclass
+class GraphCaptureContext:
+    stream: torch.cuda.Stream


How does this work for non-CUDA backends?

For XPU, this will be torch.xpu.Stream .

WoosukKwon

LGTM! Thanks for addressing my comments!

remove graph mode function

f7d4195

WoosukKwon reviewed May 14, 2024

View reviewed changes

vllm/distributed/communication_op.py Outdated Show resolved Hide resolved

WoosukKwon reviewed May 14, 2024

View reviewed changes

change variable name

46a60e5

youkaichao requested a review from WoosukKwon May 14, 2024 23:54

WoosukKwon reviewed May 15, 2024

View reviewed changes

vllm/worker/model_runner.py Show resolved Hide resolved

set stream

1a0721c

youkaichao requested a review from WoosukKwon May 15, 2024 05:51

Merge branch 'main' into graph

435ad07

WoosukKwon reviewed May 16, 2024

View reviewed changes

youkaichao added 2 commits May 16, 2024 10:27

remove old SIM117 comment

b1767f2

add type annotation for stream and pool

35b3351

WoosukKwon approved these changes May 16, 2024

View reviewed changes

youkaichao merged commit e081880 into vllm-project:main May 16, 2024
17 of 18 checks passed

youkaichao deleted the graph branch May 17, 2024 00:48

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Core][Distributed] remove graph mode function (vllm-project#4818)

1589d50

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Core][Distributed] remove graph mode function (vllm-project#4818)

eb62283

tybalex pushed a commit to tybalex/vllm-function-call that referenced this pull request May 25, 2024

[Core][Distributed] remove graph mode function (vllm-project#4818)

3799d43

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024

[Core][Distributed] remove graph mode function (vllm-project#4818)

fc6fc86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][Distributed] remove graph mode function #4818

[Core][Distributed] remove graph mode function #4818

youkaichao commented May 14, 2024

WoosukKwon May 14, 2024

youkaichao May 14, 2024

WoosukKwon May 16, 2024

youkaichao May 16, 2024

WoosukKwon left a comment

[Core][Distributed] remove graph mode function #4818

[Core][Distributed] remove graph mode function #4818

Conversation

youkaichao commented May 14, 2024

WoosukKwon May 14, 2024

Choose a reason for hiding this comment

youkaichao May 14, 2024

Choose a reason for hiding this comment

WoosukKwon May 16, 2024

Choose a reason for hiding this comment

youkaichao May 16, 2024

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment