You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #1442, today there are two ways we define the distribution of results that go into calculating a z-score for a contender result. In both ways, we look at only results:
sharing the history fingerprint of the contender
with no errors
in the direct git history of the baseline commit (which is given by the user in the case of the compare endpoint, or the parent commit in the case of the history endpoint)
These are good. But choosing the "left side" of the lookback window is where the two ways differ, and personally I think neither way is ideal:
The compare endpoint always starts exactly <DISTRIBUTION_COMMITS> before the baseline commit
The history endpoint counts <DISTRIBUTION_COMMITS> commits backwards from the baseline commits, skipping over commits that have no matching results
Instead of counting commits backwards, I propose that we define the window size to be a constant (possibly server-configurable) number of matching results. The left side is then that number of results backwards in commit time before the baseline result. (We could tiebreak by benchmark result timestamp.)
We've already discussed why Conbench puts importance on history in determining regressions: #583. That discussion speaks to the importance of having enough data to have statistical power, but not too much data such that we risk unnoticed historic distribution changes. The trouble with using a fixed number of commits for the lookback window is that for many benchmarks we sometimes skip commits (because they failed or something else went wrong) or we sometimes measure the commit more than once (for different reasons). If we use a fixed number of data points instead, our calculations will be more resilient to these fluctuations.
I believe this change will also make the calculation code simpler, and potentially faster. It will also help solve the problem posed in #1506 (comment), and solve #1442. If we do this we can mention it in #1348.
The text was updated successfully, but these errors were encountered:
As discussed in #1442, today there are two ways we define the distribution of results that go into calculating a z-score for a contender result. In both ways, we look at only results:
These are good. But choosing the "left side" of the lookback window is where the two ways differ, and personally I think neither way is ideal:
Instead of counting commits backwards, I propose that we define the window size to be a constant (possibly server-configurable) number of matching results. The left side is then that number of results backwards in commit time before the baseline result. (We could tiebreak by benchmark result timestamp.)
We've already discussed why Conbench puts importance on history in determining regressions: #583. That discussion speaks to the importance of having enough data to have statistical power, but not too much data such that we risk unnoticed historic distribution changes. The trouble with using a fixed number of commits for the lookback window is that for many benchmarks we sometimes skip commits (because they failed or something else went wrong) or we sometimes measure the commit more than once (for different reasons). If we use a fixed number of data points instead, our calculations will be more resilient to these fluctuations.
I believe this change will also make the calculation code simpler, and potentially faster. It will also help solve the problem posed in #1506 (comment), and solve #1442. If we do this we can mention it in #1348.
The text was updated successfully, but these errors were encountered: