New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rotate-and-reduce for linear sequences of rotations #531
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
j2kun
force-pushed
the
linear-to-log-rotates
branch
3 times, most recently
from
March 15, 2024 23:42
b1c0edc
to
cea97e1
Compare
j2kun
changed the title
WIP: rotate-and-reduce for sequences of rotations
Add rotate-and-reduce for linear sequences of rotations
Mar 15, 2024
j2kun
force-pushed
the
linear-to-log-rotates
branch
3 times, most recently
from
March 18, 2024 19:55
52f14c7
to
6efd0d4
Compare
asraa
approved these changes
Mar 23, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Just a few nits
j2kun
force-pushed
the
linear-to-log-rotates
branch
4 times, most recently
from
March 26, 2024 19:27
e42ae29
to
0ff09a0
Compare
j2kun
force-pushed
the
linear-to-log-rotates
branch
from
March 26, 2024 19:34
0ff09a0
to
a836fff
Compare
j2kun
added
the
pull_ready
Indicates whether a PR is ready to pull. The copybara worker will import for internal testing
label
Mar 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
pull_ready
Indicates whether a PR is ready to pull. The copybara worker will import for internal testing
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: rebased off #530
So this is working but... I can't say I'm particularly happy with it.
Now it recognizes sequences of rotates-and-adds that comprise an aggregation of every value in a vector (exactly once), and replaces them with the logarithmic rotate-and-reduce trick. I wrote down a few non-triggering examples, but I'm not confident it's bug-free by any means.
To the extent that this PR uses a dataflow analysis, I'm somewhat happy. It's an improvement over how I handled the version of this trick that worked on tensor.extract. However, the dataflow analysis has some limitations that left me frustrated. Specifically, I wanted to have the dataflow analysis accumulate the tensor accesses with multiplicity and across multiple tensors. The original
member variable inside RotationLattice was something like
DenseMap<Value, std::unordered_multiset<int64_t>>
. The join method then updated all accessed tensors and added the index access counts rather than just taking a union as it does in this PR.However, doing this hit assertions in the dataflow framework because the
Lattice
class asserts that repeated applications ofjoin
are idempotent (they call it monotonic) but in my case that would accumulate. I worked around it by constructing the analysis to work with set unions, and then added an extra step in the actual pass to do more specific detailed checks for validity. I hoped that with the extra index multiplicity information, we might be able to do more sophisticated transformations, like reordering some of the ops instead of just giving up. E.g., one thing that wouldn't work here is if the IR computes twice the sum of the tensor elements. I wonder if not using theLattice
framework and going one step lower toAbstractSparseForwardDataFlowAnalysis
might fix that, or just making a totally custom analysis unrelated to the dataflow engine, but I ran out of steam this week. I also suspect we could rely on some more academic ideas like e-graphs to identify more optimal transformations.Fixes #520
Fixes #511