Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design solution for automated backfills #1425

Open
deniseli opened this issue May 7, 2024 · 1 comment
Open

Design solution for automated backfills #1425

deniseli opened this issue May 7, 2024 · 1 comment

Comments

@deniseli
Copy link
Contributor

deniseli commented May 7, 2024

A backfill is a retroactive update to historical data. i.e. modifying the existing rows of a table, or inserting missed rows after they're discovered to be missing. Common use cases:

  • You ran a DB migration changing the schema of a table from having 1 big JSON blob column to 3 separate enum columns. Immediately upon changing your table's schema, you will have these 3 new columns, but they will be null for all the existing rows. You can backfill the new columns in all these old rows to make sure the historical data is compatible with your latest DAL+downstream code.
  • Your system collects a bunch of raw telemetry data and then processes those logs into a more consumable log format using some ETL. Your ETL pipeline goes down for a day. After the outage is over, there is a full day's worth of raw telemetry data that has yet to be processed into your consumable log format. You could backfill the consumable table (i.e. re-run the ETL over the historical logs, bound to the exact time range of the outage) to fix the gap in your consumable data.
@github-actions github-actions bot added the triage Issue needs triaging label May 7, 2024
@alecthomas alecthomas mentioned this issue May 7, 2024
@alecthomas
Copy link
Collaborator

The async call system would be ideal for implementing this.

@matt2e matt2e removed the triage Issue needs triaging label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants