Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

Add rake task to strip down production database dump #979

Open
carpodaster opened this issue Apr 5, 2018 · 6 comments
Open

Add rake task to strip down production database dump #979

carpodaster opened this issue Apr 5, 2018 · 6 comments

Comments

@carpodaster
Copy link
Member

Why?
We have a staging environment whose free database tier only supports up to 10,000 rows. We occasionally need to copy the current prod DB to staging in order to QA things.

We used to flush the Teams App's DB every year but since the introduction of the Season concept, we don't do that anymore. This is why the DB keeps growing.

What?
The list below shows how many records each ActiveRecord model has (as of April 2nd).

  • The easiest way to solve this would be to delete data for old seasons, but ideally we also keep some old data
  • The current season should be left untouched
  • Accepted teams shouldn't be purged
  • Bonus: sanitize the data (user data, team names, application data, comments) so that we may use the DB dump as a developer's snapshot
ApplicationRecord
  .descendants
  .map { |klass| { klass.to_s => klass.count } }
  .sort_by(&:values)

# [{"Season"=>4}, {"ConferenceAttendance"=>22}, 
# {"Note"=>31}, {"Mailing"=>40}, {"Conference"=>103}, {"ConferencePreference"=>112}, 
# {"Maintainership"=>153}, {"Source"=>161}, {"Project"=>196}, {"Mentor::Comment"=>229}, 
# {"Application"=>542}, {"ApplicationDraft"=>699}, {"Team"=>782}, {"Todo"=>1158}, 
# {"Rating"=>2067}, {"Comment"=>2507}, {"Role"=>3150}, {"Submission"=>3187}, 
# {"Activity"=>3564}, {"User"=>3661}]
@klappradla
Copy link
Member

Just a thought 🤔 : in regards of what just happened at Travis CI (and what previously also happened at Gitlab) we should probably prevent this from accidentally being run in the productionenvironment 😉

I'd therefore suggest to add a 4th environment: staging (don't know why we do not have it already)_ that is basically a clone of production and run the staging app in this environment.

@carpodaster
Copy link
Member Author

I'm not a fan of extra envs; it's also not advised by Heroku, but I'm also not totally against it :)

I've come across this Twitter thread that has some retrospective on AR meta data about the env its DB is configured for. We could test if that actually works.

Or we could abort if there is a DATABASE_URL set in dev or test envs.

Or we could just YOLO it, knowing that the potential impact isn't as bad; we have daily DB dumps, our app is not a high traffic one and restoring prod from the recent dev dump is a matter of 10 minutes, not several hours.

@klappradla
Copy link
Member

You're probably right with your YOLO-point 👍

For the Twitter thread and the check on a DATABASE_URL env: I'm sorry, I did not make my point clear enough above 🙈
I'm not really concerned about running things from within the dev / test env against the production database. I was more referring to doing something like:

heroku run rails db:shove -a production

or:

heroku run rails db:shove

and it defaults to production.

Anyways, was just something that came to my mind when reading the issue. I think your YOLO-point is valid and making other things obsolete ✔️

@emcoding
Copy link
Member

emcoding commented May 24, 2018

We have 552 users in the database without a github_id. We can probably delete those? I suppose they originate from adding a team member and using an invalid github_handle.

irb(main):003:0> User.where(github_id: nil).count => 552

For example: https://teams.railsgirlssummerofcode.org/community?utf8=%E2%9C%93&search=Vinh+Nguyen+Le&commit=Go

@carpodaster
Copy link
Member Author

@F3PiX good idea! Sounds like a good (late) spring cleaning

@emcoding
Copy link
Member

I'll add it to the User Controller spring cleaning then 😂 . Working on it - that's where I stumbled upon it.

emcoding added a commit that referenced this issue May 25, 2018
#979)

For real github accounts, the github_id is not nil. So if the id is nil, it is a fake account or an input error.
Updated seeds with github_id
@emcoding emcoding mentioned this issue May 25, 2018
2 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants