Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Implement call context grouping to improve performance #16456

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

aschackmull
Copy link
Contributor

This replaces our call context representation with a set-based representation of the allowed call edges, thus unifying equivalent call contexts. This unification means that we avoid redundant computation when a callable is reachable with several different, but equivalent call contexts.

In some cases I've observed 3000 equivalent call contexts, which will now be replaced by a single entity, and in that particular case the total tuple count for the stage was cut in half.

Commit-by-commit review encouraged. The first commit introduces the MakeSets primitive, which is then used to collapse local call contexts. Then a sequence of refactoring commits follow. And finally, the "Dataflow: Switch call context to a set representation" commit contains the implementation of and switch to fully set-based call contexts. This commit also does some reshuffling to ensure proper caching in CachedCallContextSensitivity.


import CallContextSensitivity<CallContextSensitivityInput>
import LocalCallContext
}

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.
Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.
}

private class CallContext = PrunedCallContextSensitivityStage5::Cc;

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.
Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.
@aschackmull aschackmull force-pushed the dataflow/callcontext-grouping branch 2 times, most recently from 985b8f0 to 0520839 Compare May 14, 2024 13:06
predicate reducedViableImplInReturnCand =
CachedCallContextSensitivity::reducedViableImplInReturn/2;
}
predicate reducedViableImplInCallContextCand =

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.
Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.

import CallContextSensitivity<CallContextSensitivityInput>
predicate reducedViableImplInReturnCand =

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.
Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.
predicate reducedViableImplInReturnCand =
Stage3Param::Level1CallContextInput::reducedViableImplInReturn/2;
}
predicate reducedViableImplInCallContextCand =

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.
Candidate predicate to
reducedViableImplInCallContext
is not marked as nomagic.

import CallContextSensitivity<CallContextSensitivityInput>
predicate reducedViableImplInReturnCand = Stage3Param::reducedViableImplInReturn/2;

Check warning

Code scanning / CodeQL

Candidate predicate not marked as `nomagic` Warning

Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.
Candidate predicate to
reducedViableImplInReturn
is not marked as nomagic.

predicate callContextNone = CachedCallContextSensitivity::ccNone/0;

predicate callContextSomeCall = CachedCallContextSensitivity::ccSomeCall/0;

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
@aschackmull aschackmull marked this pull request as ready for review May 15, 2024 14:00
@aschackmull aschackmull requested review from a team as code owners May 15, 2024 14:00
@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label May 15, 2024
@aschackmull aschackmull force-pushed the dataflow/callcontext-grouping branch from 0604d21 to ab1f10a Compare May 22, 2024 08:22
int totalorder() {
this =
rank[result](DataFlowCallable c, string file, int startline, int startcolumn |
c.hasLocationInfo(file, startline, startcolumn, _, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the total order for DataFlowCallable take the file into account, but not those for DataFlowCall or NodeRegion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because we're ordering calls and regions within a specific callable body, so the ones we need to compare are always in the same file. But the callables that we need to compare can be spread across multiple files, so they benefit from the additional comparison column.
My hope is that this ordering is fairly temporary, as I'd like to get the MakeSets module turned into a QL primitive - once it's implemented in the evaluator then there's no need for these arbitrary orderings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. In the cases where we are only comparing within the same file, would it make sense to limit the ordering to other instances in the same file? Or do you really want to have a total ordering over all calls, say, even if we only compare ones in the same file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A best-effort total ordering is simpler than trying to make it specifically per file - in the end it doesn't matter much, we just need some completely arbitrary ordering, and even in the cases where we fail to get this, the MakeSets module provides a fallback solution.

@aschackmull aschackmull force-pushed the dataflow/callcontext-grouping branch from ab1f10a to 34b82fa Compare May 23, 2024 08:24
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach LGTM. A few comments/suggestions.

shared/util/codeql/util/internal/MakeSets.qll Outdated Show resolved Hide resolved
shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll Outdated Show resolved Hide resolved
shared/dataflow/codeql/dataflow/DataFlow.qll Show resolved Hide resolved
private predicate idOf(BasicBlock x, int y) = equivalenceRelation(id/2)(x, y)

class NodeRegion instanceof BasicBlock {
string toString() { result = "NodeRegion" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use final extends instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels slightly wrong to me, since a NodeRegion isn't necessarily a basic block - it could also be defined as e.g. a set of nodes guarded by a given guard. Currently the data flow side of the implementation assumes that NodeRegions are disjoint, but that doesn't have to be the case. It could conceivably be useful to define them differently if we run into guards that guard a huge number of basic blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But its are purely internal implementation detail, right, so we can always change it as we see fit? I merely suggested to use final extends to avoid the dummy toString definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final extends needs a dummy alias instead, so there's not much saved. And it still feels wrong for the stated reason: A basic block is certainly a region of nodes, but a region of nodes is not necessarily a basic block.

Comment on lines +374 to +395
class NodeRegion instanceof BasicBlock {
string toString() { result = "NodeRegion" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final extends?

@@ -522,14 +502,155 @@ module MakeImplCommon<LocationSig Location, InputSig<Location> Lang> {
)
}

private module CallSetsInput implements MkSetsInp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is needed because DispatchSets does not handle the case when there are no dispatch targets. However, I think it would be more explicit if we had something like

    private module DispatchSetsInput implements MkSetsInp {
      class Key = TCallEdge;

      class Value = Option<TCallEdge>::Option;

      Value getAValue(TCallEdge ctxEdge) {
        exists(DataFlowCall ctx, DataFlowCallable c, DataFlowCall call, DataFlowCallable tgt |
          ctxEdge = TMkCallEdge(ctx, c) and
          reducedViableImplInCallContext(call, c, ctx) |
          result.asSome() = TMkCallEdge(call, tgt) and
          viableImplInCallContextExtIn(call, ctx) = tgt
          or
          not exists(viableImplInCallContextExtIn(call, ctx)) and
          result.isNone()

        )
      }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that could work as well, but I chose the current version to avoid wrapping all of the DispatchSet edges in 'some'-wrappers. Also, it becomes quite tricky to get right if we stuff everything together like that - your version above e.g. fails to specify which calls have no dispatch, so an option-type doesn't carry enough information. For a given call affected by the context, there's either a non-empty set of dispatch edges or an empty set, but that empty set cannot be represented by a none value, because then we don't know what call it refers to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the Option would have to apply to the Callable in TCallEdge. Let's leave it as-is then. I think this section deserves a comment then about why we need both CallSets and DispatchSets.

@aschackmull
Copy link
Contributor Author

Overall approach LGTM. A few comments/suggestions.

All comments should be addressed now.

@aschackmull
Copy link
Contributor Author

More comments addressed. (Including one change that had somehow slipped from the first commit).


private module CallSets = MakeSets<CallSetsInput>;

private module CallSetOption = Option<CallSets::ValueSet>;

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.

private module DispatchSets = MakeSets<DispatchSetsInput>;

private module DispatchSetsOption = Option<DispatchSets::ValueSet>;

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest DCA run for C# seems to be broken. The Ruby DCA run shows a nice speedup on gh/gh.

Perhaps it is worth running a final batch of DCA for all languages before merging?

@aschackmull
Copy link
Contributor Author

Perhaps it is worth running a final batch of DCA for all languages before merging?

Sure. Running now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants