Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide analysis_data table, create new commits table #34

Open
brianwarner opened this issue Apr 1, 2019 · 0 comments
Open

Divide analysis_data table, create new commits table #34

brianwarner opened this issue Apr 1, 2019 · 0 comments

Comments

@brianwarner
Copy link
Owner

Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33

For a while I have wanted to optimize analysis_data. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.

There is some benefit in breaking this info out into a separate table, called commits. It would reduce the overall size of analysis_data (I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.

In addition, it also gives us a new central place to store the commit message, which may be useful info.

The main changes required are:

  • Alter setup.py to move these columns out of analysis_data and into a new commits table
  • Add a clause to the function update_db in facade-worker.py to add the new commits table, copy over commit and author/committer info, remove old columns from analysis_data and optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages.
  • Update the caching functions with the new join between analysis_data and commits
  • Add the ability to view commit messages to various UIs
  • Cut a new major release, because this is a significant database change

While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first facade-worker.py run after pulling this code will take longer than usual, but that's likely the only impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant