Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookback z-score algorithm: change how width of lookback window is defined #1512

Open
austin3dickey opened this issue Oct 23, 2023 · 0 comments
Assignees

Comments

@austin3dickey
Copy link
Member

As discussed in #1442, today there are two ways we define the distribution of results that go into calculating a z-score for a contender result. In both ways, we look at only results:

  • sharing the history fingerprint of the contender
  • with no errors
  • in the direct git history of the baseline commit (which is given by the user in the case of the compare endpoint, or the parent commit in the case of the history endpoint)

These are good. But choosing the "left side" of the lookback window is where the two ways differ, and personally I think neither way is ideal:

  • The compare endpoint always starts exactly <DISTRIBUTION_COMMITS> before the baseline commit
  • The history endpoint counts <DISTRIBUTION_COMMITS> commits backwards from the baseline commits, skipping over commits that have no matching results

Instead of counting commits backwards, I propose that we define the window size to be a constant (possibly server-configurable) number of matching results. The left side is then that number of results backwards in commit time before the baseline result. (We could tiebreak by benchmark result timestamp.)

We've already discussed why Conbench puts importance on history in determining regressions: #583. That discussion speaks to the importance of having enough data to have statistical power, but not too much data such that we risk unnoticed historic distribution changes. The trouble with using a fixed number of commits for the lookback window is that for many benchmarks we sometimes skip commits (because they failed or something else went wrong) or we sometimes measure the commit more than once (for different reasons). If we use a fixed number of data points instead, our calculations will be more resilient to these fluctuations.

I believe this change will also make the calculation code simpler, and potentially faster. It will also help solve the problem posed in #1506 (comment), and solve #1442. If we do this we can mention it in #1348.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant