Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SampleFDO] Stale profile call-graph matching #92151

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented May 14, 2024

Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as int foo() --> long foo() ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. The main idea is to leverage the call-site anchor similarity to determine the renaming and limit the matching scope into profiled CFG edges.

Some noteworthy details:

  1. Determine the renaming by call-site anchor similarity check
  • The inputs are the two new function(doesn't appear in other side) lists from IR and Profile, say a) IRRenamingFuncList. b) ProfileRenamingFuncList. A new function functionMatchesProfile(IRFunc, ProfFunc) is used to check the renaming for the possible <IRFunc, ProfFunc> pair in IRRenamingFuncList X ProfileRenamingFuncList

  • InfunctionMatchesProfile(IRFunc, ProfFunc), extract call-site anchors and use the LCS(diff) matching to compute the equal set and we define: Similarity = |equalSet * 2| / (|A| + |B|). The profile name is marked as renamed if the similarity is above a threshold(--renamed-func-similarity-threshold) and update the profile using the new name.

  1. Use CFG edge info to reduce the matching scope.
  • As our goal is to salvage the profile, we don't need to run all possible renaming check. Instead, if we can salvage every edges in the profiled CFG, i,e, make sure the caller profile can find the callee profile, that should be enough. Based on this, we use the CFG edge, the caller to callee relations, to limit the matching scope, only run the matching on the same caller scope.
  • Once the renaming is found, we cache it so that it can be reused in deeper level caller.
  1. Added a new switch --salvage-renamed-profile to control this, default is false.

  2. This is also complementary to the CFG level(block/call) matching, which is after the CG matching, the new identified function can further be run by the CFG matching.

Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)

Copy link

github-actions bot commented May 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@wlei-llvm wlei-llvm force-pushed the renaming branch 4 times, most recently from e6b7077 to 89bdc6e Compare May 16, 2024 06:09
@wlei-llvm wlei-llvm requested review from WenleiHe and MatzeB May 16, 2024 17:11
@wlei-llvm wlei-llvm marked this pull request as ready for review May 16, 2024 17:11
@llvmbot
Copy link
Collaborator

llvmbot commented May 16, 2024

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as int foo() --> long foo() ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable.

This patch introduces stale profile function-level matching, targeting at identifying the trivial function renaming and reusing the old function profile. The main idea is to leverage the call-site anchor similarity to determine the renaming and limit the matching scope into profiled CFG edges.

Some noteworthy details:

  1. Determine the renaming by call-site anchor similarity check
  • The inputs are the two new function(doesn't appear in other side) lists from IR and Profile, say a) IRRenamingFuncList. b) ProfileRenamingFuncList. A new function checkFunctionIsRename(IRFunc, ProfFunc) is used to check the renaming for the possible <IRFunc, ProfFunc> pair in IRRenamingFuncList X ProfileRenamingFuncList

  • IncheckFunctionIsRename(IRFunc, ProfFunc), extract call-site anchors and use the LCS(diff) matching to compute the equal set and we define: Similarity = |equalSet * 2| / (|A| + |B|). The profile name is marked as renamed if the similarity is above a threshold(--func-renaming-similarity-threshold) and update the profile using the new name.

  1. Use CFG edge info to reduce the matching scope.
  • As our goal is to salvage the profile, we don't need to run all possible renaming check. Instead, if we can salvage every edges in the profiled CFG, i,e, make sure the caller profile can find the callee profile, that should be enough. Based on this, we use the CFG edge, the caller to callee relations, to limit the matching scope, only run the matching on the same caller scope.
  • Once the renaming is found, we cache it so that it can be reused in deeper level caller.
  1. Added a new switch --salvage-function-renaming to control this, default is false.

  2. This is also complementary to the block-level matching, which is after renaming matching, the new identified function can further be run by the block-level matching.

Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)


Patch is 37.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92151.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h (+45-9)
  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+3-3)
  • (modified) llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp (+280-14)
  • (added) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-renaming.prof (+62)
  • (added) llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-renaming.ll (+297)
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
index b6feca5d47035..f2feb9ba8832d 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
@@ -15,12 +15,14 @@
 #define LLVM_TRANSFORMS_IPO_SAMPLEPROFILEMATCHER_H
 
 #include "llvm/ADT/StringSet.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"
 
 namespace llvm {
 
 using AnchorList = std::vector<std::pair<LineLocation, FunctionId>>;
 using AnchorMap = std::map<LineLocation, FunctionId>;
+using FunctionMap = HashKeyMap<std::unordered_map, FunctionId, Function *>;
 
 // Sample profile matching - fuzzy match.
 class SampleProfileMatcher {
@@ -58,6 +60,20 @@ class SampleProfileMatcher {
   StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>>
       FuncCallsiteMatchStates;
 
+  struct RenameDecisionCacheHash {
+    uint64_t
+    operator()(const std::pair<const Function *, FunctionId> &P) const {
+      return hash_combine(P.first, P.second);
+    }
+  };
+  std::unordered_map<std::pair<const Function *, FunctionId>, bool,
+                     RenameDecisionCacheHash>
+      RenameDecisionCache;
+
+  FunctionMap *SymbolMap;
+
+  std::shared_ptr<ProfileSymbolList> PSL;
+
   // Profile mismatch statstics:
   uint64_t TotalProfiledFunc = 0;
   // Num of checksum-mismatched function.
@@ -80,9 +96,11 @@ class SampleProfileMatcher {
 public:
   SampleProfileMatcher(Module &M, SampleProfileReader &Reader,
                        const PseudoProbeManager *ProbeManager,
-                       ThinOrFullLTOPhase LTOPhase)
-      : M(M), Reader(Reader), ProbeManager(ProbeManager), LTOPhase(LTOPhase){};
-  void runOnModule();
+                       ThinOrFullLTOPhase LTOPhase,
+                       std::shared_ptr<ProfileSymbolList> PSL)
+      : M(M), Reader(Reader), ProbeManager(ProbeManager), LTOPhase(LTOPhase),
+        PSL(PSL) {};
+  void runOnModule(FunctionMap &SymbolMap);
   void clearMatchingData() {
     // Do not clear FuncMappings, it stores IRLoc to ProfLoc remappings which
     // will be used for sample loader.
@@ -90,16 +108,20 @@ class SampleProfileMatcher {
   }
 
 private:
-  FunctionSamples *getFlattenedSamplesFor(const Function &F) {
-    StringRef CanonFName = FunctionSamples::getCanonicalFnName(F);
-    auto It = FlattenedProfiles.find(FunctionId(CanonFName));
+  FunctionSamples *getFlattenedSamplesFor(const FunctionId &Fname) {
+    auto It = FlattenedProfiles.find(Fname);
     if (It != FlattenedProfiles.end())
       return &It->second;
     return nullptr;
   }
-  void runOnFunction(Function &F);
-  void findIRAnchors(const Function &F, AnchorMap &IRAnchors);
-  void findProfileAnchors(const FunctionSamples &FS, AnchorMap &ProfileAnchors);
+  FunctionSamples *getFlattenedSamplesFor(const Function &F) {
+    StringRef CanonFName = FunctionSamples::getCanonicalFnName(F);
+    return getFlattenedSamplesFor(FunctionId(CanonFName));
+  }
+  void runBlockLevelMatching(Function &F);
+  void findIRAnchors(const Function &F, AnchorMap &IRAnchors) const;
+  void findProfileAnchors(const FunctionSamples &FS,
+                          AnchorMap &ProfileAnchors) const;
   // Record the callsite match states for profile staleness report, the result
   // is saved in FuncCallsiteMatchStates.
   void recordCallsiteMatchStates(const Function &F, const AnchorMap &IRAnchors,
@@ -160,6 +182,20 @@ class SampleProfileMatcher {
   void runStaleProfileMatching(const Function &F, const AnchorMap &IRAnchors,
                                const AnchorMap &ProfileAnchors,
                                LocToLocMap &IRToProfileLocationMap);
+  void findIRNewCallees(Function &Caller,
+                        const StringMap<Function *> &IRNewFunctions,
+                        std::vector<Function *> &IRNewCallees);
+  float checkFunctionSimilarity(const Function &IRFunc,
+                                const FunctionId &ProfFunc);
+  bool functionIsRenamedImpl(const Function &IRFunc,
+                             const FunctionId &ProfFunc);
+  bool functionIsRenamed(const Function &IRFunc, const FunctionId &ProfFunc);
+  void
+  runFuncRenamingMatchingOnProfile(const StringMap<Function *> &IRNewFunctions,
+                                   FunctionSamples &FS,
+                                   FunctionMap &OldProfToNewSymbolMap);
+  void findIRNewFunctions(StringMap<Function *> &IRNewFunctions);
+  void runFuncLevelMatching();
   void reportOrPersistProfileStats();
 };
 } // end namespace llvm
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 0920179fb76b7..0b096668084ca 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -544,7 +544,7 @@ class SampleProfileLoader final : public SampleProfileLoaderBaseImpl<Function> {
 
   /// Profle Symbol list tells whether a function name appears in the binary
   /// used to generate the current profile.
-  std::unique_ptr<ProfileSymbolList> PSL;
+  std::shared_ptr<ProfileSymbolList> PSL;
 
   /// Total number of samples collected in this profile.
   ///
@@ -2076,7 +2076,7 @@ bool SampleProfileLoader::doInitialization(Module &M,
   if (ReportProfileStaleness || PersistProfileStaleness ||
       SalvageStaleProfile) {
     MatchingManager = std::make_unique<SampleProfileMatcher>(
-        M, *Reader, ProbeManager.get(), LTOPhase);
+        M, *Reader, ProbeManager.get(), LTOPhase, PSL);
   }
 
   return true;
@@ -2197,7 +2197,7 @@ bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
 
   if (ReportProfileStaleness || PersistProfileStaleness ||
       SalvageStaleProfile) {
-    MatchingManager->runOnModule();
+    MatchingManager->runOnModule(SymbolMap);
     MatchingManager->clearMatchingData();
   }
 
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index d7613bce4c52e..2b07856252ecd 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -20,12 +20,22 @@ using namespace sampleprof;
 
 #define DEBUG_TYPE "sample-profile-matcher"
 
+static cl::opt<bool> SalvageFunctionRenaming(
+    "salvage-function-renaming", cl::Hidden, cl::init(false),
+    cl::desc("Salvage stale profile by function renaming matching."));
+
+static cl::opt<unsigned> FuncRenamingSimilarityThreshold(
+    "func-renaming-similarity-threshold", cl::Hidden, cl::init(80),
+    cl::desc(
+        "The profile function is considered being renamed if the similarity "
+        "against IR is above the given number(percentage value)."));
+
 extern cl::opt<bool> SalvageStaleProfile;
 extern cl::opt<bool> PersistProfileStaleness;
 extern cl::opt<bool> ReportProfileStaleness;
 
 void SampleProfileMatcher::findIRAnchors(const Function &F,
-                                         AnchorMap &IRAnchors) {
+                                         AnchorMap &IRAnchors) const {
   // For inlined code, recover the original callsite and callee by finding the
   // top-level inline frame. e.g. For frame stack "main:1 @ foo:2 @ bar:3", the
   // top-level frame is "main:1", the callsite is "1" and the callee is "foo".
@@ -95,7 +105,7 @@ void SampleProfileMatcher::findIRAnchors(const Function &F,
 }
 
 void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
-                                              AnchorMap &ProfileAnchors) {
+                                              AnchorMap &ProfileAnchors) const {
   auto isInvalidLineOffset = [](uint32_t LineOffset) {
     return LineOffset & 0x8000;
   };
@@ -260,6 +270,22 @@ void SampleProfileMatcher::matchNonCallsiteLocs(
   }
 }
 
+// Filter the non-call locations from IRAnchors and ProfileAnchors and write
+// them into a list for random access later.
+static void getFilteredAnchorList(const AnchorMap &IRAnchors,
+                                  const AnchorMap &ProfileAnchors,
+                                  AnchorList &FilteredIRAnchorsList,
+                                  AnchorList &FilteredProfileAnchorList) {
+  for (const auto &I : IRAnchors) {
+    if (I.second.stringRef().empty())
+      continue;
+    FilteredIRAnchorsList.emplace_back(I);
+  }
+
+  for (const auto &I : ProfileAnchors)
+    FilteredProfileAnchorList.emplace_back(I);
+}
+
 // Call target name anchor based profile fuzzy matching.
 // Input:
 // For IR locations, the anchor is the callee name of direct callsite; For
@@ -286,16 +312,9 @@ void SampleProfileMatcher::runStaleProfileMatching(
          "Run stale profile matching only once per function");
 
   AnchorList FilteredProfileAnchorList;
-  for (const auto &I : ProfileAnchors)
-    FilteredProfileAnchorList.emplace_back(I);
-
   AnchorList FilteredIRAnchorsList;
-  // Filter the non-callsite from IRAnchors.
-  for (const auto &I : IRAnchors) {
-    if (I.second.stringRef().empty())
-      continue;
-    FilteredIRAnchorsList.emplace_back(I);
-  }
+  getFilteredAnchorList(IRAnchors, ProfileAnchors, FilteredIRAnchorsList,
+                        FilteredProfileAnchorList);
 
   if (FilteredIRAnchorsList.empty() || FilteredProfileAnchorList.empty())
     return;
@@ -311,7 +330,7 @@ void SampleProfileMatcher::runStaleProfileMatching(
   matchNonCallsiteLocs(MatchedAnchors, IRAnchors, IRToProfileLocationMap);
 }
 
-void SampleProfileMatcher::runOnFunction(Function &F) {
+void SampleProfileMatcher::runBlockLevelMatching(Function &F) {
   // We need to use flattened function samples for matching.
   // Unlike IR, which includes all callsites from the source code, the callsites
   // in profile only show up when they are hit by samples, i,e. the profile
@@ -590,13 +609,260 @@ void SampleProfileMatcher::computeAndReportProfileStaleness() {
   }
 }
 
-void SampleProfileMatcher::runOnModule() {
+// Find functions that don't show in the profile or profile symbol list, which
+// are supposed to be new functions. We use them as the targets for renaming
+// matching.
+void SampleProfileMatcher::findIRNewFunctions(
+    StringMap<Function *> &IRNewFunctions) {
+  // TODO: Support MD5 profile.
+  if (FunctionSamples::UseMD5)
+    return;
+  StringSet<> NamesInProfile;
+  if (auto NameTable = Reader.getNameTable()) {
+    for (auto Name : *NameTable)
+      NamesInProfile.insert(Name.stringRef());
+  }
+
+  for (auto &F : M) {
+    // Skip declarations, as even if the function can be recognized renamed, we
+    // have nothing to do with it.
+    if (F.isDeclaration())
+      continue;
+
+    StringRef CanonFName = FunctionSamples::getCanonicalFnName(F.getName());
+    const auto *FS = getFlattenedSamplesFor(F);
+    if (FS)
+      continue;
+
+    // For extended binary, the full function name symbols exits in the profile
+    // symbol list table.
+    if (NamesInProfile.count(CanonFName))
+      continue;
+
+    if (PSL && PSL->contains(CanonFName))
+      continue;
+
+    LLVM_DEBUG(dbgs() << "Function " << CanonFName
+                      << " is not in profile or symbol list table.\n");
+    IRNewFunctions[CanonFName] = &F;
+  }
+}
+
+void SampleProfileMatcher::findIRNewCallees(
+    Function &Caller, const StringMap<Function *> &IRNewFunctions,
+    std::vector<Function *> &IRNewCallees) {
+  for (auto &BB : Caller) {
+    for (auto &I : BB) {
+      const auto *CB = dyn_cast<CallBase>(&I);
+      if (!CB || isa<IntrinsicInst>(&I))
+        continue;
+      Function *Callee = CB->getCalledFunction();
+      if (!Callee || Callee->isDeclaration())
+        continue;
+      StringRef CalleeName =
+          FunctionSamples::getCanonicalFnName(Callee->getName());
+      if (IRNewFunctions.count(CalleeName))
+        IRNewCallees.push_back(Callee);
+    }
+  }
+}
+
+// Use function similarity to determine if the function is renamed. Compute a
+// similarity ratio between two sequences which are  the function callsite
+// anchors. The returned value is in the range [0, 1]. The bigger the value is,
+// the more similar two sequences are.
+float SampleProfileMatcher::checkFunctionSimilarity(
+    const Function &IRFunc, const FunctionId &ProfFName) {
+  AnchorMap IRAnchors;
+  findIRAnchors(IRFunc, IRAnchors);
+
+  AnchorMap ProfileAnchors;
+  const auto *FSFlattened = getFlattenedSamplesFor(ProfFName);
+  assert(FSFlattened && "Flattened profile sample is null");
+  findProfileAnchors(*FSFlattened, ProfileAnchors);
+
+  AnchorList FilteredProfileAnchorList;
+  AnchorList FilteredIRAnchorsList;
+  getFilteredAnchorList(IRAnchors, ProfileAnchors, FilteredIRAnchorsList,
+                        FilteredProfileAnchorList);
+
+  // If the function is probe based, we trust the checksum info to check the
+  // similarity. Otherwise, if the checksum is mismatched, continue computing
+  // the similarity.
+  if (FunctionSamples::ProfileIsProbeBased) {
+    const auto *FuncDesc = ProbeManager->getDesc(IRFunc);
+    // Make sure function is complex enough.
+    if (IRAnchors.size() - FilteredIRAnchorsList.size() > 5 && FuncDesc &&
+        !ProbeManager->profileIsHashMismatched(*FuncDesc, *FSFlattened)) {
+      return 1.0;
+    }
+  }
+
+  if (FilteredIRAnchorsList.empty() || FilteredProfileAnchorList.empty())
+    return 0.0;
+
+  // Use the diff algorithm to find the LCS between IR and profile.
+  LocToLocMap MatchedAnchors =
+      longestCommonSequence(FilteredIRAnchorsList, FilteredProfileAnchorList);
+
+  return static_cast<float>(MatchedAnchors.size()) * 2 /
+         (FilteredIRAnchorsList.size() + FilteredProfileAnchorList.size());
+}
+
+bool SampleProfileMatcher::functionIsRenamedImpl(const Function &IRFunc,
+                                                 const FunctionId &ProfFunc) {
+  float Similarity = checkFunctionSimilarity(IRFunc, ProfFunc);
+  LLVM_DEBUG(dbgs() << "The similarity between " << IRFunc.getName()
+                    << "(IR) and " << ProfFunc << "(profile) is "
+                    << format("%.2f", Similarity) << "\n");
+  return Similarity * 100 > FuncRenamingSimilarityThreshold;
+}
+
+bool SampleProfileMatcher::functionIsRenamed(const Function &IRFunc,
+                                             const FunctionId &ProfFunc) {
+  auto R = RenameDecisionCache.find({&IRFunc, ProfFunc});
+  if (R != RenameDecisionCache.end())
+    return R->second;
+
+  bool V = functionIsRenamedImpl(IRFunc, ProfFunc);
+  RenameDecisionCache[{&IRFunc, ProfFunc}] = V;
+  return V;
+}
+
+// Run function renaming matching on the profiled CFG edge to limit the matching
+// scope.
+void SampleProfileMatcher::runFuncRenamingMatchingOnProfile(
+    const StringMap<Function *> &IRNewFunctions, FunctionSamples &CallerFS,
+    FunctionMap &OldProfToNewSymbolMap) {
+  auto FindIRFunction = [&](const FunctionId &FName) {
+    // Function can be null if name has conflict, use optional to store the
+    // function pointer.
+    std::optional<Function *> F;
+
+    auto R = SymbolMap->find(FName);
+    if (R != SymbolMap->end())
+      F = R->second;
+
+    auto NewR = OldProfToNewSymbolMap.find(FName);
+    if (NewR != OldProfToNewSymbolMap.end())
+      F = NewR->second;
+
+    return F;
+  };
+
+  // Find the new callees from IR in the current caller scope.
+  std::vector<Function *> IRNewCallees;
+  auto Caller = FindIRFunction(CallerFS.getFunction());
+  if (Caller.has_value() && *Caller) {
+    // No callees for external function, skip the rename matching.
+    if ((*Caller)->isDeclaration())
+      return;
+    findIRNewCallees(**Caller, IRNewFunctions, IRNewCallees);
+  }
+
+  // Run renaming matching on CFG edge(caller-callee).
+  for (auto &CM :
+       const_cast<CallsiteSampleMap &>(CallerFS.getCallsiteSamples())) {
+    auto &CalleeMap = CM.second;
+    // Local container used to update the CallsiteSampleMap.
+    std::vector<std::pair<FunctionId, FunctionSamples *>> FSamplesToUpdate;
+    for (auto &CS : CalleeMap) {
+      auto &CalleeFS = CS.second;
+      auto ProfCallee = CalleeFS.getFunction();
+      auto ExistingIRCallee = FindIRFunction(ProfCallee);
+      // The profile callee is new, run renaming matching.
+      if (!ExistingIRCallee.has_value()) {
+        for (auto *IRCallee : IRNewCallees) {
+          if (functionIsRenamed(*IRCallee, ProfCallee)) {
+            FSamplesToUpdate.emplace_back(ProfCallee, &CalleeFS);
+            OldProfToNewSymbolMap[ProfCallee] = IRCallee;
+            // Update the profile in place so that the deeper level matching
+            // will find the IR function.
+            CalleeFS.setFunction(FunctionId(IRCallee->getName()));
+            LLVM_DEBUG(dbgs() << "Callee renaming is found in function "
+                              << CallerFS.getFunction()
+                              << ", changing profile name from " << ProfCallee
+                              << " to " << IRCallee->getName() << "\n");
+            break;
+          }
+        }
+      } else {
+        // Apply the existing renaming result.
+        auto R = OldProfToNewSymbolMap.find(CalleeFS.getFunction());
+        if (R != OldProfToNewSymbolMap.end()) {
+          FunctionId IRNewCallee(R->second->getName());
+          assert(IRNewCallee != ProfCallee &&
+                 "New callee symbol is not a new function");
+          FSamplesToUpdate.emplace_back(ProfCallee, &CalleeFS);
+          CalleeFS.setFunction(IRNewCallee);
+          LLVM_DEBUG(dbgs() << "Existing callee renaming is found in function "
+                            << CallerFS.getFunction()
+                            << ", changing profile name from " << ProfCallee
+                            << " to " << IRNewCallee << "\n");
+        }
+      }
+      // Note that even there is no renaming in the current scope, there could
+      // be renaming in deeper callee scope, we need to traverse all the callee
+      // profiles.
+      runFuncRenamingMatchingOnProfile(IRNewFunctions, CalleeFS,
+                                       OldProfToNewSymbolMap);
+    }
+
+    // Update the CalleeMap using the new name and remove the old entry.
+    for (auto &P : FSamplesToUpdate) {
+      assert((P.first != P.second->getFunction()) &&
+             "Renamed function name should be different from the old map key");
+      CalleeMap[P.second->getFunction()] = *P.second;
+      CalleeMap.erase(P.first);
+    }
+  }
+}
+
+void SampleProfileMatcher::runFuncLevelMatching() {
+  if (!SalvageFunctionRenaming)
+    return;
+  assert(SymbolMap && "SymbolMap points to null");
+
+  StringMap<Function *> IRNewFunctions;
+  findIRNewFunctions(IRNewFunctions);
+  if (IRNewFunctions.empty())
+    return;
+
+  // The new functions found by the renaming matching. Save them into a map
+  // whose key is the old(profile) function name and value is the new(renamed)
+  // function.
+  FunctionMap OldProfToNewSymbolMap;
+  for (auto &I : Reader.getProfiles())
+    runFuncRenamingMatchingOnProfile(IRNewFunctions, I.second,
+                                     OldProfToNewSymbolMap);
+
+  // Update all the data generated by the old profile.
+  if (!OldProfToNewSymbolMap.empty()) {
+    // Add the new function to the SymbolMap, which will be used in
+    // SampleLoader.
+    for (auto &I : OldProfToNewSymbolMap) {
+      assert(I.second && "New function is null");
+      SymbolMap->emplace(FunctionId(I.second->getName()), I.second);
+    }
+
+    // Re-flatten the profiles after the renaming.
+    FlattenedProfiles.clear();
+    ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
+                                     FunctionSamples::ProfileIsCS);
+  }
+  RenameDecisionCache.clear();
+}
+
+void SampleProfileMatcher::runOnModule(FunctionMap &SymMap) {
   ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
                                    FunctionSamples::ProfileIsCS);
+  SymbolMap = &SymMap;
+  runFuncLevelMatching();
+
   for (auto &F : M) {
     if (skipProfileForFunction(F))
       continue;
-    runOnFunction(F);
+    runBlockLevelMatching(F);
   }
   if (SalvageStaleProfile)
     distributeIRToProfileLocationMap();
diff --git a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-renaming.prof b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-renaming.prof
new file mode 100644
index 0000000000000..1e23ef26d1b15
--- /dev/n...
[truncated]

@WenleiHe
Copy link
Member

confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)

There is no guarantee that these wrong matching can only happen in cold functions. I'm wondering if we should skip name matching attempt for tiny functions where don't have enough signal for high confidence matching.

@@ -20,12 +20,22 @@ using namespace sampleprof;

#define DEBUG_TYPE "sample-profile-matcher"

static cl::opt<bool> SalvageFunctionRenaming(
"salvage-function-renaming", cl::Hidden, cl::init(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: salvage-renamed-profile to be consistent with salvage-stale-profile

cl::desc("Salvage stale profile by function renaming matching."));

static cl::opt<unsigned> FuncRenamingSimilarityThreshold(
"func-renaming-similarity-threshold", cl::Hidden, cl::init(80),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: renamed-func-similarity-threshold

Comment on lines 69 to 71
std::unordered_map<std::pair<const Function *, FunctionId>, bool,
RenameDecisionCacheHash>
RenameDecisionCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the linter suggest format/style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :facepalm

@@ -260,6 +270,22 @@ void SampleProfileMatcher::matchNonCallsiteLocs(
}
}

// Filter the non-call locations from IRAnchors and ProfileAnchors and write
// them into a list for random access later.
static void getFilteredAnchorList(const AnchorMap &IRAnchors,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why file static function, instead of member (static) function for SampleProfileMatcher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to member function.

StringRef CanonFName = FunctionSamples::getCanonicalFnName(F);
return getFlattenedSamplesFor(FunctionId(CanonFName));
}
void runBlockLevelMatching(Function &F);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: runCFGMatching

@@ -58,6 +60,20 @@ class SampleProfileMatcher {
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>>
FuncCallsiteMatchStates;

struct RenameDecisionCacheHash {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: rename is not a decision. More like FunctionProfileNameMap?

Also I suggest we don't put too much emphasis on "renaming", because this is more general than handling renaming as renaming is one of the causes.

ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
FunctionSamples::ProfileIsCS);
}
RenameDecisionCache.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also assert the cache is empty on function entry?

void SampleProfileMatcher::runFuncLevelMatching() {
if (!SalvageFunctionRenaming)
return;
assert(SymbolMap && "SymbolMap points to null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SymbolMap points to null -> SymbolMap is null

F = R->second;

auto NewR = OldProfToNewSymbolMap.find(FName);
if (NewR != OldProfToNewSymbolMap.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if SymbolMap already has the function, do we still need to look it up in OldProfToNewSymbolMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Good catch, changed to return if entry is found in SymbolMap.

findIRNewCallees(**Caller, IRNewFunctions, IRNewCallees);
}

// Run renaming matching on CFG edge(caller-callee).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming matching -> function to profile matching?

Comment on lines 770 to 772
auto &CalleeFS = CS.second;
auto ProfCallee = CalleeFS.getFunction();
auto ExistingIRCallee = FindIRFunction(ProfCallee);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are too many auto here while the types are not too long to spell. suggest to spell out the types to help readability.

const StringMap<Function *> &IRNewFunctions, FunctionSamples &CallerFS,
FunctionMap &OldProfToNewSymbolMap) {
auto FindIRFunction = [&](const FunctionId &FName) {
// Function can be null if name has conflict, use optional to store the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we put null into either OldProfToNewSymbolMap or SymbolMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the SampleLoader, right before the MatchingManager->runOnModule(SymbolMap);, see the comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2179-L2185.

// Note that even there is no renaming in the current scope, there could
// be renaming in deeper callee scope, we need to traverse all the callee
// profiles.
runFuncRenamingMatchingOnProfile(IRNewFunctions, CalleeFS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do CFG profile matching, we use flattened profile so profile can have more anchors and it's easier to match IR. In this case, I see that checkFunctionSimilarity also use flattened profile. Should we operate this call graph level matching completely on flattened profile so we avoid the recursion here for inlinees? There is no need to repeatedly match multiple instance of an inlinee (even though there is cache).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point. But note that this function doesn't only do the matching, but also update/write the original profile,(i,e. apply the matching results), that's why we needed to run recursively on the original profile, (see the FSamplesToUpdate is used to update the original profile.)

That said, if we want to do it on flattened profile, one option is to split it into two function, one function for the matching using flattened profile and one function to update the original profile, that might make each function clearer and simpler.(CFG matching kind of works in this way)

Any thoughts?


// Run renaming matching on CFG edge(caller-callee).
for (auto &CM :
const_cast<CallsiteSampleMap &>(CallerFS.getCallsiteSamples())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this style of matching profile to function only works for inlined functions? What if a function is not inlined anywhere, but is renamed?

Should we also consider not-inlined callsites from body samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch!

I think we can do non-inlined callsite. I was originally planning to do non-inlined callsites, but later I was worried that it might increase build speed and it's less important than inline callites since those non-inlined callsites are likely cold functions, so I skipped them, then later I found that the build speed cost looks not a concern, but I forgot to add it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I still have a small concern is whether we need to consider cold function matching. I was using the !PSI->isHot() to filter them but I removed it in this version. I guess in normal situation, user code won't have a lot of renaming, so build speed should not be a concern(verified on our internal service). just not very sure if we can have special cases that user code contains so many new functions, it can significantly slow down the matching. (maybe I just overthink..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's less important than inline callites since those non-inlined callsites are likely cold functions

Not inlined may not be cold. Caller/callee can have incompatible attributes that forbids inlining even though call is hot. For such cases, we still want to be able to do the CG level matching.

just not very sure if we can have special cases that user code contains so many new functions, it can significantly slow down the matching. (maybe I just overthink..)

as long as the stale profile matching algorithm scales linearly to number of new functions, it should be fine I think.

Copy link
Contributor Author

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for the awesome code review!

There is no guarantee that these wrong matching can only happen in cold functions. I'm wondering if we should skip name matching attempt for tiny functions where don't have enough signal for high confidence matching.

Yeah, sounds good, I tried to skip using the checksum for tiny function ( the "magic" 5 num block..) we can apply the same to the general matching.

// Note that even there is no renaming in the current scope, there could
// be renaming in deeper callee scope, we need to traverse all the callee
// profiles.
runFuncRenamingMatchingOnProfile(IRNewFunctions, CalleeFS,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point. But note that this function doesn't only do the matching, but also update/write the original profile,(i,e. apply the matching results), that's why we needed to run recursively on the original profile, (see the FSamplesToUpdate is used to update the original profile.)

That said, if we want to do it on flattened profile, one option is to split it into two function, one function for the matching using flattened profile and one function to update the original profile, that might make each function clearer and simpler.(CFG matching kind of works in this way)

Any thoughts?


// Run renaming matching on CFG edge(caller-callee).
for (auto &CM :
const_cast<CallsiteSampleMap &>(CallerFS.getCallsiteSamples())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch!

I think we can do non-inlined callsite. I was originally planning to do non-inlined callsites, but later I was worried that it might increase build speed and it's less important than inline callites since those non-inlined callsites are likely cold functions, so I skipped them, then later I found that the build speed cost looks not a concern, but I forgot to add it back.


// Run renaming matching on CFG edge(caller-callee).
for (auto &CM :
const_cast<CallsiteSampleMap &>(CallerFS.getCallsiteSamples())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I still have a small concern is whether we need to consider cold function matching. I was using the !PSI->isHot() to filter them but I removed it in this version. I guess in normal situation, user code won't have a lot of renaming, so build speed should not be a concern(verified on our internal service). just not very sure if we can have special cases that user code contains so many new functions, it can significantly slow down the matching. (maybe I just overthink..)

if (NamesInProfile.count(CanonFName))
continue;

if (PSL && PSL->contains(CanonFName))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could happen. IIUC, NamesInProfile/NameTable just collected all the symbols in the profile, but PSL contains the symbol names appear in the binary but not in the profile(like the function is never hit by the samples).

continue;

// For extended binary, the full function name symbols exits in the profile
// symbol list table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment, it's both needed for extended binary.

const StringMap<Function *> &IRNewFunctions, FunctionSamples &CallerFS,
FunctionMap &OldProfToNewSymbolMap) {
auto FindIRFunction = [&](const FunctionId &FName) {
// Function can be null if name has conflict, use optional to store the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the SampleLoader, right before the MatchingManager->runOnModule(SymbolMap);, see the comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2179-L2185.

F = R->second;

auto NewR = OldProfToNewSymbolMap.find(FName);
if (NewR != OldProfToNewSymbolMap.end())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Good catch, changed to return if entry is found in SymbolMap.

if (FunctionSamples::ProfileIsProbeBased) {
const auto *FuncDesc = ProbeManager->getDesc(IRFunc);
// Make sure function is complex enough.
if (IRAnchors.size() - FilteredIRAnchorsList.size() > 5 && FuncDesc &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason here is if the function is too simple, like only containing one entry block or only one if-else, in other words, the num block is small, it's likely to get the checksum conflict and give a wrong matching result, I checked from the results from one service, there are some wrong matching because of this.

So, 5 is mostly my guess from this, basically I wanted to avoid 1 basic block or simple one or two if-else.

const auto *FuncDesc = ProbeManager->getDesc(IRFunc);
// Make sure function is complex enough.
if (IRAnchors.size() - FilteredIRAnchorsList.size() > 5 && FuncDesc &&
!ProbeManager->profileIsHashMismatched(*FuncDesc, *FSFlattened)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same reason as the above comment, here it needs the num of block(IRAnchors.size() - FilteredIRAnchorsList.size()) which needs to be after the findIR/Profile anchor to check if the function is complex enough.

Comment on lines 69 to 71
std::unordered_map<std::pair<const Function *, FunctionId>, bool,
RenameDecisionCacheHash>
RenameDecisionCache;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :facepalm

@wlei-llvm
Copy link
Contributor Author

In the new commit:

  • Added the non-inline call target matching.
  • Sort the input profiles to make the matching deterministic.
  • Merged the matching and find function into a one function findOrMatchNewFunction.
  • Other misc issues.

matchProfileForNewFunctions(newIRFunctions, *P, OldProfToNewSymbolMap);

// Update all the data generated by the old profile.
if (!OldProfToNewSymbolMap.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to early return when it is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you meant to save the indentation, yes, we can move ahead the cache clear function, done.

const_cast<SampleRecord::CallTargetMap &>(BS.second.getCallTargets());
for (const auto &TS : CTM) {
const FunctionId &ProfCallee = TS.first;
auto MatchRes = findOrMatchNewFunction(ProfCallee, OldProfToNewSymbolMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this O(N^2) with N being the number of callees?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, but here the callees sets are not all the callees but only new functions(filter out existing matched functions), so it's more like New callees from IR * New callees from Profile, I think either of them should be small in reality.

std::vector<Function *> &NewIRCallees);
bool functionMatchesProfileHelper(const Function &IRFunc,
const FunctionId &ProfFunc);
bool functionMatchesProfile(const Function &IRFunc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a documentation on this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "documentation" you meant to add a comment like Doxygen style, right? If so, added more documentations.(I realized I get used to add the comments in the cpp file..., thanks for the suggestion)


const auto *FSFlattened = getFlattenedSamplesFor(ProfFunc);
assert(FSFlattened && "Flattened profile sample is null");
// Similarity check may not be reiable if the function is tiny, we use the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reiable/reliable/

@wlei-llvm wlei-llvm changed the title [SampleFDO] Stale profile renaming matching [SampleFDO] Stale profile call-graph matching May 29, 2024
}
}

std::pair<Function *, bool> SampleProfileMatcher::findOrMatchFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to lookup IR function for caller as well, so the parameter name ProfCallee is not accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed to ProfFunc

return;
findNewIRCallees(*IRCaller, NewIRFunctions, NewIRCallees);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it early return if the new IR caller is not found ? In this case the NewIRCallees is empty, so the following matching seems just doing redundant work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, we want to run matching for all callers' callees, here is to traverse the nested profile to process all the caller. Even if there is no new callees in the current caller, there could be new callees in the inlined/nested callers. See line 835 there is a recursive call for matchProfileForNewFunctions which is to run matching for children callers.

std::vector<Function *> NewIRCallees;
if (auto *IRCaller =
findOrMatchFunction(CallerFS.getFunction(), OldProfToNewSymbolMap)
.first) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why limiting it to new IRcaller only? Any pair of matching ProfileCaller and IRCaller can be used to drive callee matching below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to clarify: this is not to limit to new IRCaller, it is for all callers. Note that here there is no 3rd parameter for function findOrMatchFunction, which has a default argument for the NewIRCallees(const std::vector<Function *> &NewIRCallees = std::vector<Function *>()), in this case, it's just an empty vector which means it doesn't do any new function matching, so it's only for lookup for existing functions.

The NewIRCallees in 774 is used in findNewIRCallees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is checking '.first', not .second. Is it possible to refactor the function and split into two interfaces? One is the readonly part -- does 'find' only and make the findOrMatch to call 'find' first and if not found, try to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, in that way we can just call the find here which makes the code clearer. Thanks for the suggestion!

@llvmbot llvmbot added the PGO Profile Guided Optimizations label May 30, 2024
}
}

std::pair<Function *, SampleProfileMatcher::MatchState>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is a bit tricky, the function can be null even if it's found in SymbolMap, so we can't relay on checking the null of the return to tell if's found. So here I use the a enum MatchState.

void SampleProfileMatcher::findNewIRCallees(
Function &Caller, const StringMap<Function *> &NewIRFunctions,
std::vector<Function *> &NewIRCallees) {
for (auto &BB : Caller) {
Copy link
Contributor Author

@wlei-llvm wlei-llvm May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this function may be run redundantly, we traverse the nested tree for all callers, but it's possible that the same function appears multiple times in different inline context and will be run repeatedly. Then the solution could be use a cache or run on a flattened profile(but that needs a new function to update the original profile)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants