Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gcs database #12817

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Gcs database #12817

wants to merge 10 commits into from

Conversation

bkochendorfer
Copy link
Member

One-line summary

Allow uploading and downloading from GCP CS (GCS) instead of AWS S3.

Significant changes and points to review

I felt like just throwing an error during the upload if there was a failure is fine but let me know if you want me to try to figure out what to catch.

Issue / Bugzilla link

#12637

Testing

I deployed the change manually in our dev gcp environment and connected to the clock pod to manually run an upload and a download.

@@ -21,9 +21,12 @@
BUCKET_NAME = os.getenv("AWS_DB_S3_BUCKET", "bedrock-db-dev")
REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2")
S3_BASE_URL = f"https://s3-{REGION_NAME}.amazonaws.com/{BUCKET_NAME}"
GCS_BASE_URL = f"https://storage.googleapis.com/{BUCKET_NAME}"
DOWNLOAD_FROM_GCS = os.getenv("DOWNLOAD_FROM_GCS", 'False').lower() in ('true', '1', 't')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our tests don't check black formatting on these files, but it would be nice if they passed (wrt the single vs double quoting).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this line might be a little over the top too. Open to feedback on how you all like to handle boolean values as env variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other parts of the code base we use everett. If we used it here this could be written as DOWNLOAD_FROM_GCS = config("DOWNLOAD_FROM_GCS", parser=bool, default="false"). But not sure why it isn't already used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh cool, thanks @robhudson pushed up a change that pulls that in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, maybe that is why looks to fail the tests:

Traceback (most recent call last):
  File "/app/./bin/run-db-download.py", line 10, in <module>
    from bedrock.base.config_manager import config
ModuleNotFoundError: No module named 'bedrock'

@robhudson
Copy link
Member

Looks like ./bin/cron.py imports it but there's a path that is added in that one for it to work.

@bkochendorfer
Copy link
Member Author

Thanks @robhudson !


# must import after adding bedrock to path
from bedrock.base.config_manager import config # noqa

BUCKET_NAME = os.getenv("AWS_DB_S3_BUCKET", "bedrock-db-dev")
REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have config now, we could optionally convert these (or anywhere else in this file where we use os.getenv). An example of a string value would just be:

BUCKET_NAME = config("AWS_DB_S3_BUCKET", default="bedrock-db-dev")

Sorry to tack on so much extra.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all!

@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.12 ⚠️

Comparison is base (a9a2f04) 76.99% compared to head (b7ca03a) 76.88%.

❗ Current head b7ca03a differs from pull request most recent head c3b0cc5. Consider uploading reports for the commit c3b0cc5 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12817      +/-   ##
==========================================
- Coverage   76.99%   76.88%   -0.12%     
==========================================
  Files         144      144              
  Lines        7704     7669      -35     
==========================================
- Hits         5932     5896      -36     
- Misses       1772     1773       +1     

see 8 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -63,6 +85,18 @@ def upload_db_data(db_data):
except Boto3Error:
return f"ERROR: Failed to upload the new database info file: {db_data}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want anything around the if not s3 line above, so that in the future if we remove the AWS credentials this won't short circuit and never make it this far? Or would changes around S3 come later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to do kind of a multi step deploy:

  1. Get the code out
  2. Turn on GCS so it uploads to both GCS and S3, let that sit for a while
  3. Turn on download from GCS, let that sit
  4. Come back and remove the s3 references

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds good to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkochendorfer @pmac Can we get this rolled out to prod? Good to do ahead of the postgres/cms tango

Copy link
Member

@robhudson robhudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the extra changes. 👍

@stevejalim
Copy link
Collaborator

@bkochendorfer Do you need one of us to merge this?

@bkochendorfer
Copy link
Member Author

Yes I do not have merge capabilities here

@robhudson
Copy link
Member

Chatted with Brett. We're going to sit on this until there's a bit more time to focus on it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants