Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp how CDash removes old data #2093

Open
3 of 36 tasks
zackgalbreath opened this issue Mar 20, 2024 · 1 comment
Open
3 of 36 tasks

Revamp how CDash removes old data #2093

zackgalbreath opened this issue Mar 20, 2024 · 1 comment

Comments

@zackgalbreath
Copy link
Contributor

zackgalbreath commented Mar 20, 2024

Feature Request

How can we make CDash better?

PRs #1655, #1656, and #1657 added foreign keys to many of CDash's tables, helping to protect our data integrity & make sure that old data gets deleted automatically when it is no longer referenced.

I audited the rest of CDash's tables and came up with the following list of recommendations.

Unused tables we could probably drop without impacting existing functionality

  • apitoken

Tables that would benefit from foreign keys

Tables whose rows contain a timestamp that could be used for periodic deletion

  • dailyupdate
  • lockout
  • password_resets
  • subproject
  • subprojectgroup

It's worth noting here that the following tables are already cleaned up in addDailyChanges():

  • buildgroup
  • build2grouprule
  • failed_jobs
  • successful_jobs
  • usertemp

Shared data that could be deleted by periodic NOT IN (...) queries:

  • buildfailuredetails (buildfailure.detailsid)
  • buildfailureargument (buildfailure2argument.argumentid)
  • buildupdate (build2update.updateid)
  • configure (build2configure.configureid)
  • coveragefile (coverage.fileid)
  • image (test2image.imgid)
  • label -- this one seems tricky, we would have to check every label2* table.
  • note (build2note.noteid)
  • repositories (project2repositories.repositoryid)
  • site (build.siteid)
  • test (build2test.testid and/or testoutput.testid)
  • testoutput (build2test.outputid)
  • uploadfile (build2uploadfile.fileid)

Many of these tables are already being handled through clever queries in remove_builds() but if a row somehow "slips through the cracks" it currently requires manual intervention to delete it later on.

Functionality to more generally reconsider:

  • coveragefile2user -- this association seems tied to a particular version of a source file, I'm not sure that's actually useful?
  • dailyupdatefile -- we might not need this at all anymore since we dropped the "feed" a while back?
@williamjallen
Copy link
Collaborator

It's worth noting that some tables like the banner table contain "global" rows (using the project ID 0, for example), which makes it more difficult than it initially appears.

Great work putting together this list though! I'll gradually work though it as I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants