Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api/history: introduce pagination if it makes sense #1506

Open
austin3dickey opened this issue Oct 19, 2023 · 2 comments
Open

api/history: introduce pagination if it makes sense #1506

austin3dickey opened this issue Oct 19, 2023 · 2 comments
Assignees

Comments

@austin3dickey
Copy link
Member

See #871 (comment).

@austin3dickey austin3dickey self-assigned this Oct 19, 2023
@austin3dickey
Copy link
Member Author

Today GET /api/history/ is actually quite similar to GET /api/benchmark-results, except

  • it's filtered to one history fingerprint
  • it only returns default-branch results without errors
  • it includes z-score analysis for each result (compared to 100 previous commits' worth of results)

The z-score analysis is where pagination gets a little tricky. Each data point needs 100 previous commits' worth of lookback data to calculate its z-score. This means in order to get a page of data, we need to query not only those benchmark results but also some history, as demonstrated in this diagram.

image

So querying subsequent pages of data will result in redundant queries of raw data and redundant distribution calculations. This seems fairly important to avoid, especially since this endpoint is a little expensive already. It might be better to query all the data at once and do all the calculations.

So then do we always return all the data in one page? Certain history fingerprints in Arrow already have >1000 associated results. As this scales we might start running into transfer limits. So we need to find a good way to balance the goals I stated in #871 (comment).

@austin3dickey
Copy link
Member Author

austin3dickey commented Oct 23, 2023

I may implement this by returning all data in one page for now, and then revisit after #1512 is completed.

I was wrong before; no calculations would be redundant, just the data querying. But the data querying will be a LOT easier if #1512 is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant