[dtensor] refactor view ops to use OpStrategy #126011

tianyu-l · 2024-05-11T23:44:39Z

Stack from ghstack (oldest at bottom):

As titled. Some ops require adjustment of output shape argument. In rule-based sharding prop, global output shape was inferred in the rule (in view_ops.py). In strategy-based sharding prop, it is now obtained from propagated out_tensor_meta (in sharding_prop.py).

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @wconstab @yf225 @chauhang @d4l3k

[ghstack-poisoned]

pytorch-bot · 2024-05-11T23:44:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126011

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit 2347722 with merge base da9bf77 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 1767d9ed7dc1b77e7a83f7be3fa5caa158cfc5c2 Pull Request resolved: #126011

Previously, the rule-based view op sharding prop adjust a non-tensor arg `local_out_shape` within the rule itself. This was not viable in strategy-based sharding prop. Thus, this PR is adding a new option `non_tensor_arg_suggestions` into `PlacementStrategy` to address this problem. It also benefits the new factory ops in that we no longer need to compute their local shape and stride in `sharding_prop.py` in a customized way. Instead, we compute all such expected **tensor and non-tensor args** in `tensor_ops.py` and `view_ops.py`, which keeps the `sharding_prop.py` clean. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: e7b1885f5fb24b8198e6e4b565d35cbc92a54b4d Pull Request resolved: #126011

wanchaol

first pass, this looks pretty good after an initial look, I do want to see if we can simplify propagate_shape_and_sharding, and why we need to call it twice

torch/distributed/_tensor/ops/view_ops.py

wanchaol · 2024-05-14T18:44:08Z

looks like CI is failing with circular deps, please fix

As titled. Some ops require adjustment of output shape argument. In rule-based sharding prop, global output shape was inferred in the rule (in `view_ops.py`). In strategy-based sharding prop, it is now obtained from propagated out_tensor_meta (in `sharding_prop.py`). cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 94c584701a1f79c1314fb4b2a602a5cb87cf0f28 Pull Request resolved: #126011

wanchaol

This looks great! Have some minor comments

torch/distributed/_tensor/op_schema.py

wanchaol · 2024-05-15T23:41:19Z

test/distributed/_tensor/test_view_ops.py

        self.assertEqual(view_as_complex_rule, expected_view_as_complex_rule)
        expected_view_as_real_rule = (
            InputDim(0),
            Split(InputDim(1), (13, 2), 0),
            Split(InputDim(1), (13, 2), 1),
        )
-        view_as_real_rule = ops[torch.view_as_real].dim_map(intermediate)
+        view_as_real_rule = dim_maps[torch.view_as_real](intermediate)


I think we'll need to improve some of the existing test cases in this file to work with CommDebugMode, to make sure the communication that are happening is expected. This can be done in a follow up PR

As titled. Some ops require adjustment of output shape argument. In rule-based sharding prop, global output shape was inferred in the rule (in `view_ops.py`). In strategy-based sharding prop, it is now obtained from propagated out_tensor_meta (in `sharding_prop.py`). cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 wconstab yf225 chauhang d4l3k [ghstack-poisoned]

tianyu-l · 2024-05-17T05:36:52Z

@pytorchbot merge

pytorchmergebot · 2024-05-17T05:38:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

As titled. Some ops require adjustment of output shape argument. In rule-based sharding prop, global output shape was inferred in the rule (in `view_ops.py`). In strategy-based sharding prop, it is now obtained from propagated out_tensor_meta (in `sharding_prop.py`). Pull Request resolved: pytorch#126011 Approved by: https://github.com/wanchaol, https://github.com/XilunWu

[dtensor] refactor view ops to use OpStrategy

a5161fc

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels May 11, 2024

tianyu-l added a commit that referenced this pull request May 11, 2024

[dtensor] refactor view ops to use OpStrategy

a45b7ad

ghstack-source-id: 1767d9ed7dc1b77e7a83f7be3fa5caa158cfc5c2 Pull Request resolved: #126011

tianyu-l requested review from wanchaol and XilunWu May 13, 2024 17:33

XilunWu added the topic: not user facing topic category label May 13, 2024

tianyu-l added a commit that referenced this pull request May 13, 2024

[dtensor] refactor view ops to use OpStrategy

3313587

ghstack-source-id: e7b1885f5fb24b8198e6e4b565d35cbc92a54b4d Pull Request resolved: #126011

tianyu-l added ciflow/trunk Trigger trunk jobs on your pull request release notes: distributed (dtensor) release notes category and removed topic: not user facing topic category labels May 13, 2024

wanchaol reviewed May 14, 2024

View reviewed changes

torch/distributed/_tensor/ops/view_ops.py Outdated Show resolved Hide resolved

torch/distributed/_tensor/ops/view_ops.py Outdated Show resolved Hide resolved

torch/distributed/_tensor/ops/view_ops.py Outdated Show resolved Hide resolved

tianyu-l added a commit that referenced this pull request May 15, 2024

[dtensor] refactor view ops to use OpStrategy

5397535

ghstack-source-id: 94c584701a1f79c1314fb4b2a602a5cb87cf0f28 Pull Request resolved: #126011

tianyu-l requested a review from wanchaol May 15, 2024 21:53

wanchaol approved these changes May 15, 2024

View reviewed changes

tianyu-l mentioned this pull request May 16, 2024

[dtensor] remove output_ prefix from OpStrategy properties #126359

Closed

XilunWu approved these changes May 16, 2024

View reviewed changes

pytorchmergebot added the merging label May 17, 2024

pytorchmergebot closed this in 9edf54d May 17, 2024

pytorchmergebot added Merged and removed merging labels May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor] refactor view ops to use OpStrategy #126011

[dtensor] refactor view ops to use OpStrategy #126011

tianyu-l commented May 11, 2024 •

edited

pytorch-bot bot commented May 11, 2024 •

edited

wanchaol left a comment

wanchaol commented May 14, 2024

wanchaol left a comment

wanchaol May 15, 2024

tianyu-l commented May 17, 2024

pytorchmergebot commented May 17, 2024

[dtensor] refactor view ops to use OpStrategy #126011

[dtensor] refactor view ops to use OpStrategy #126011

Conversation

tianyu-l commented May 11, 2024 • edited

pytorch-bot bot commented May 11, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126011

✅ You can merge normally! (6 Unrelated Failures)

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol commented May 14, 2024

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol May 15, 2024

Choose a reason for hiding this comment

tianyu-l commented May 17, 2024

pytorchmergebot commented May 17, 2024

Merge started

tianyu-l commented May 11, 2024 •

edited

pytorch-bot bot commented May 11, 2024 •

edited