New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataflow: Implement call context grouping to improve performance #16456
base: main
Are you sure you want to change the base?
Dataflow: Implement call context grouping to improve performance #16456
Conversation
|
||
import CallContextSensitivity<CallContextSensitivityInput> | ||
import LocalCallContext | ||
} |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInCallContext
Candidate predicate to
reducedViableImplInCallContext
} | ||
|
||
private class CallContext = PrunedCallContextSensitivityStage5::Cc; | ||
|
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInReturn
Candidate predicate to
reducedViableImplInReturn
985b8f0
to
0520839
Compare
predicate reducedViableImplInReturnCand = | ||
CachedCallContextSensitivity::reducedViableImplInReturn/2; | ||
} | ||
predicate reducedViableImplInCallContextCand = |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInCallContext
Candidate predicate to
reducedViableImplInCallContext
|
||
import CallContextSensitivity<CallContextSensitivityInput> | ||
predicate reducedViableImplInReturnCand = |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInReturn
Candidate predicate to
reducedViableImplInReturn
predicate reducedViableImplInReturnCand = | ||
Stage3Param::Level1CallContextInput::reducedViableImplInReturn/2; | ||
} | ||
predicate reducedViableImplInCallContextCand = |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInCallContext
Candidate predicate to
reducedViableImplInCallContext
|
||
import CallContextSensitivity<CallContextSensitivityInput> | ||
predicate reducedViableImplInReturnCand = Stage3Param::reducedViableImplInReturn/2; |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic` Warning
reducedViableImplInReturn
Candidate predicate to
reducedViableImplInReturn
|
||
predicate callContextNone = CachedCallContextSensitivity::ccNone/0; | ||
|
||
predicate callContextSomeCall = CachedCallContextSensitivity::ccSomeCall/0; |
Check warning
Code scanning / CodeQL
Dead code Warning
0604d21
to
ab1f10a
Compare
int totalorder() { | ||
this = | ||
rank[result](DataFlowCallable c, string file, int startline, int startcolumn | | ||
c.hasLocationInfo(file, startline, startcolumn, _, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the total order for DataFlowCallable
take the file into account, but not those for DataFlowCall
or NodeRegion
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because we're ordering calls and regions within a specific callable body, so the ones we need to compare are always in the same file. But the callables that we need to compare can be spread across multiple files, so they benefit from the additional comparison column.
My hope is that this ordering is fairly temporary, as I'd like to get the MakeSets module turned into a QL primitive - once it's implemented in the evaluator then there's no need for these arbitrary orderings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. In the cases where we are only comparing within the same file, would it make sense to limit the ordering to other instances in the same file? Or do you really want to have a total ordering over all calls, say, even if we only compare ones in the same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A best-effort total ordering is simpler than trying to make it specifically per file - in the end it doesn't matter much, we just need some completely arbitrary ordering, and even in the cases where we fail to get this, the MakeSets module provides a fallback solution.
…llContextReducedReverse.
…Common::CallContextSensitivity.
ab1f10a
to
34b82fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach LGTM. A few comments/suggestions.
private predicate idOf(BasicBlock x, int y) = equivalenceRelation(id/2)(x, y) | ||
|
||
class NodeRegion instanceof BasicBlock { | ||
string toString() { result = "NodeRegion" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use final extends
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels slightly wrong to me, since a NodeRegion
isn't necessarily a basic block - it could also be defined as e.g. a set of nodes guarded by a given guard. Currently the data flow side of the implementation assumes that NodeRegion
s are disjoint, but that doesn't have to be the case. It could conceivably be useful to define them differently if we run into guards that guard a huge number of basic blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But its are purely internal implementation detail, right, so we can always change it as we see fit? I merely suggested to use final extends
to avoid the dummy toString
definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final extends
needs a dummy alias instead, so there's not much saved. And it still feels wrong for the stated reason: A basic block is certainly a region of nodes, but a region of nodes is not necessarily a basic block.
class NodeRegion instanceof BasicBlock { | ||
string toString() { result = "NodeRegion" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final extends
?
shared/dataflow/codeql/dataflow/internal/DataFlowImplCommon.qll
Outdated
Show resolved
Hide resolved
@@ -522,14 +502,155 @@ module MakeImplCommon<LocationSig Location, InputSig<Location> Lang> { | |||
) | |||
} | |||
|
|||
private module CallSetsInput implements MkSetsInp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is needed because DispatchSets
does not handle the case when there are no dispatch targets. However, I think it would be more explicit if we had something like
private module DispatchSetsInput implements MkSetsInp {
class Key = TCallEdge;
class Value = Option<TCallEdge>::Option;
Value getAValue(TCallEdge ctxEdge) {
exists(DataFlowCall ctx, DataFlowCallable c, DataFlowCall call, DataFlowCallable tgt |
ctxEdge = TMkCallEdge(ctx, c) and
reducedViableImplInCallContext(call, c, ctx) |
result.asSome() = TMkCallEdge(call, tgt) and
viableImplInCallContextExtIn(call, ctx) = tgt
or
not exists(viableImplInCallContextExtIn(call, ctx)) and
result.isNone()
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that could work as well, but I chose the current version to avoid wrapping all of the DispatchSet
edges in 'some'-wrappers. Also, it becomes quite tricky to get right if we stuff everything together like that - your version above e.g. fails to specify which calls have no dispatch, so an option-type doesn't carry enough information. For a given call affected by the context, there's either a non-empty set of dispatch edges or an empty set, but that empty set cannot be represented by a none value, because then we don't know what call it refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the Option
would have to apply to the Callable
in TCallEdge
. Let's leave it as-is then. I think this section deserves a comment then about why we need both CallSets
and DispatchSets
.
shared/dataflow/codeql/dataflow/internal/DataFlowImplCommon.qll
Outdated
Show resolved
Hide resolved
All comments should be addressed now. |
More comments addressed. (Including one change that had somehow slipped from the first commit). |
|
||
private module CallSets = MakeSets<CallSetsInput>; | ||
|
||
private module CallSetOption = Option<CallSets::ValueSet>; |
Check warning
Code scanning / CodeQL
Dead code Warning
|
||
private module DispatchSets = MakeSets<DispatchSetsInput>; | ||
|
||
private module DispatchSetsOption = Option<DispatchSets::ValueSet>; |
Check warning
Code scanning / CodeQL
Dead code Warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest DCA run for C# seems to be broken. The Ruby DCA run shows a nice speedup on gh/gh
.
Perhaps it is worth running a final batch of DCA for all languages before merging?
Sure. Running now. |
This replaces our call context representation with a set-based representation of the allowed call edges, thus unifying equivalent call contexts. This unification means that we avoid redundant computation when a callable is reachable with several different, but equivalent call contexts.
In some cases I've observed 3000 equivalent call contexts, which will now be replaced by a single entity, and in that particular case the total tuple count for the stage was cut in half.
Commit-by-commit review encouraged. The first commit introduces the
MakeSets
primitive, which is then used to collapse local call contexts. Then a sequence of refactoring commits follow. And finally, the "Dataflow: Switch call context to a set representation" commit contains the implementation of and switch to fully set-based call contexts. This commit also does some reshuffling to ensure proper caching inCachedCallContextSensitivity
.