Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Build Deployment Process (External) #961

Open
wants to merge 73 commits into
base: master
Choose a base branch
from

Conversation

MukuFlash03
Copy link
Contributor

@MukuFlash03 MukuFlash03 commented Mar 26, 2024

This PR serves as the implementation for this issue for redesigning our build and deployment processes across e-mission-server, join, admin-dash, public-dash primarily.

Collaborating on this along with @nataliejschultz .

All four redesign PRs for the external repos:
E-mission-server: #961
Join: e-mission/nrel-openpath-join-page#29
Admin-dash: e-mission/op-admin-dashboard#113
Public-dash: e-mission/em-public-dashboard#125


A rough plan can be:

1st phase

  • Consolidating Differences
    • Minimize the differences in internal and external versions, cleanup unnecessary files, so on.
  • Image_Build_Push.yml:
    • CI / CD via GitHub actions to build and push images to Dockerhub for all four repos based on the current process for e-mission-server.
    • Image build push only builds for that specific repo when merge is made to a specific branch.
    • Additionally, for e-mission-server, image-build-push will now also need to trigger the GitHub action in join, admin-dash, public-dash.

2nd phase:

  • Multi-tier uniform repo structure in nrelopenpath
    • Need to change the internal repos to pull from the externally built images for each repo without duplicating the entire external code repository internally.

Mahadik, Mukul Chandrakant added 2 commits March 21, 2024 11:47
Added a try-catch block similar to how other config files are used in case where actual non-sample file is present else use the sample file.

Can remove the file from internal repo since not needed anymore.
Changed level to DEBUG and formatter set to detailed.

Can keep both external and internal versions same by making external also have the same logging level as internal.

Internal version given priority as detailed logging is better for detecting errors.

Also, debug() statements [~1500] are much more than warning() statements [~50].

Will lose out on all these when external repo is used / run if only WARNING level set as default which is higher than DEBUG level.
@MukuFlash03
Copy link
Contributor Author

Checking in the first initial set of changes for consolidating the differences in the external and internal versions.
These are mostly related to config files based on the differences identified here

Currently included these changes:

  1. secret_list.json
  • Modified code here ./emission/net/auth/secret.py
  • Added try-catch block to load data from secret_list non sample file present, else load from sample file.
  • Now, can be removed from internal repo: nrelopenpath/webapp/conf/net/auth/secret_list.json as well.
    • Changes made in internal repo.

  1. webapp/conf/analysis/debug.conf
  • Removed the entire analysis directory from internal repo: nrelopenpath/webapp/analysis.
    • Changes made in internal repo.

  1. Logging level changed for these files:
    Webapp/conf/log/webserver.conf <-> log/webserver.conf.sample
    analysis/conf/log/intake.conf <-> log/intake.conf.sample
  • Can aim to match both versions by making external also have the same logging level as internal.
  • Internal given priority as detailed logging is better for detecting errors, though it does make logs verbose.
  • Also, DEBUG() statements [~1500] are much more than WARNING statements [~50].
    • Got this count by searching in the code base for logging.debug() and logging.warning(), respectively.
  • Will lose out on all these when external repo is used / run if we still keep level as WARNING.
  • Made both webserver.conf and intake.conf use logging level debug and formatter detailed.
    • File can now be removed internally as it matches external ones; removed log/ directory in both webapp and analysis.

Mahadik, Mukul Chandrakant and others added 13 commits March 28, 2024 21:45
Push.json.sample just has 4 key-value pairs which can be changed to ENV variables internally.

Hence, need to write code to read from Env variables.

Changing emission/net/ext_service/push/notify_interface.py to read from env variables.

Using helper class config.py, based on existing template.

First reading from .json file if exists, else read from env variables.
If env variables are all None, then say that values not configured.
Added new environment variable PROD_STAGE.

If this is set to TRUE, then the debug.conf.internal.json will be selected as the conf file.

Else, the existing try-catch block is executed which either checks for a debug.conf.json or uses the sample file if the former does not exist.

The try part is still valid, since some Test files copy over the sample file as debug.conf.json, then reload the config in eac.reload_config() which would need reading from debug.conf.json as well.

Hence, now three files exist:
- debug.conf.json.sample
- debug.conf.json
- debug.conf.json.internal
This is the first part of the method to share the server image tag between other repositories. It would be shared as tag.txt, which gets overwritten every time the image build runs.
…jq from docker_start_script.sh

Summary:
Currently replaced config file values with environment variables.
Expecting developers to manually set env vars instead of conf files.
So, for instance, running containers using `docker-compose` or `docker run`, need to set these values similarly like push environment variables will need to be set.

Key-value pairs in the webserver config file:
REMOVED: log_base_dir, python_path, log_file
ENV: static_path, 404_redirect, host, port, timeout, auth, aggregate_call_auth

———————————

Changes

1. webserver.conf.sample
- Kept file as it is and changed how values being read in cfc_webapp.py, TestWebserver.py and docker_start_script.sh

2. TestWebserver.py
- In setup(): Storing current original ENV variables, Replacing them with test values
- In teardown(): Restoring original ENV variables

3. cfc_webapp.py
- Removed log_base_dir, python_path, log_file as these were only used in the cfc_webapp.py to read in the value from the config file and not used elsewhere.
- Added environment variables for the other config key-value pairs to avoid dependence on config file and as they were being used in the cfc_webapp.py file.

4. docker_start_script.sh - Removed sed / jq usage for editing webserver.conf.sample.json file and copying over as webserver.conf.json; simply setting the environment variable for WEB_SERVER_HOST now.

-------------------

Special notes:

1. config.json <-> webserver.conf.sample.json
- Some files still use config.json and these changes were made 7-8 years ago (even 10 years ago). This config.json file has the same contents as the current webserver.conf.sample.
- So, webserver.conf.sample.json was actually config.json at some point!
e-mission@a028dec

2. Sed localhost replacement isn’t functionally correct
- The default sample value for server.host is "0.0.0.0".
- But the sed command replaces “localhost” with the ENV variable value.
- Since the localhost keyword itself isn’t there in the sample file, the replacement will not work either.

----------------
Initially, was setting them in the if block if ENV variables already existed.
It wasn’t being set as if condition was being evaluated as False.
But if condition should be mainly to store existing values.
In any case, test values must be set, hence moving it outside if block.
…d / jq use

Details of files changed

1. Start scripts
.docker/docker_start_script.sh
emission/integrationTests/start_integration_tests.sh
setup/tests/start_script.sh

- Changed seq / jq usage to directly set Environment variable to desired value; no need of saving sample file as actual conf json file.

2. Config.py files
emission/core/config.py
emission/net/api/config.py
emission/net/ext_service/push/config.py

- Based this file on emission/analysis/config.py
- Added these to read from conf files if present or environment variables instead of sample files.
- Default values set are taken from sample files.
- check_unset_env_vars() can be used to check whether ALL environment variables are unset.

3. DB, Webapp, Push application usage files
emission/core/get_database.py
emission/net/api/cfc_webapp.py
emission/net/ext_service/push/notify_interface.py

- Changed logic to read using the config.py files that read the non-sample actual config files if present or from the Environment variables instead of sample files.

4. Test Files
emission/integrationTests/storageTests/TestMongodbAuth.py
emission/tests/netTests/TestWebserver.py
emission/tests/netTests/TestPush.py

- Test files that exercise the functionality of the logic in the files in (3).
- Earlier, config files were being replaced with test values and copied over for testing purposes.
- Now, switching to using environment variables - call sent to config files in (2) indirectly via application usage files in (3)
- Following flow is followed in reading from and restoring original environment variables values
	setup()
        - Sets ENV vars by storing original vars if set, then uses test ENV vars as new values
	TestFunc()
        - importing modules named in (3) causes values to be read in, which now reads in newer test values since they set the ENV vars in setup()
        - only those ENV vars in test values are set; unchanged ones left untouched or default values read using os.getenv(var name, default)
	teardown()
        - Unset test ENV vars by using del os.environ[var_name]
        - Restore original values from original dict
test_run_script_empty () and test_run_script_show() were failing
- Was occurring because I had used logging.debug() instead of print()
- Hence std output stream was unable to get the print(“storage not configured”) statement

Some more fixes:
- db.conf incorrectly read instead of webserver.conf in config.py in emisison/net/api
- TestPush had an incomplete test, that was used just for testing purposes to invoke calls on importing the pni module.
- TestWebserver.py changed print() statements to logging.debug()
Assertion Error this time in another CI GitHub actions
Workflow name : test-with-docker

Succeeded earlier when the other test (ubuntu-only-test-with-manual-install) failed:
https://github.com/e-mission/e-mission-server/actions/runs/8658279606/job/23741854476

- This happened since then “storage not configured” wasn’t being printed, which is what the test wants as docker compose is used to set up mongo db container.
- With docker, localhost becomes db as the DB_HOST ENV var or “url” key-value.

SOLVED
- Added if condition to print “storage not configured” only when url value is till equal to sample value “localhost”
Will first check if debug.conf exists, like other conf files.
If it does not, except block then check if PROD_STAGE environment variable is set and whether debug.conf.internal is present.
Else, fall back to sample conf.
Accidentally added this after running all tests.

def check_unset_env_vars():
config_data_env = {
"url": os.getenv('DB_HOST'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our testing, it looks like these values are not formatted correctly to pull into get_database.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our testing, it looks like these values are not formatted correctly to pull into get_database.py.

More details on errors encountered are mentioned here in the public-dash PR.

To summarize:

  1. Faced this error so far with docker-compose.dev with public-dash. Works fine with the prod version - docker-compose.yml.

  2. Also, inside the dev containers spun up using docker-compose.dev, I was able to acess values for these environment variables directly via python.

Need further investigation.

@nataliejschultz
Copy link
Contributor

nataliejschultz commented Apr 25, 2024

Server testing:

First testing the external code as it currently is by building on the current master image. After creating the db:

$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0

I built the container as such:

$ docker run --name em-server-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage_program --network emission shankari/e-mission-server:master_2024-04-15--53-23

The container spun up and I ran:

$ docker exec -it em-server-1 /bin/bash

Started the e-mission environment:

Source setup/activate.sh

And ran all the tests:

./runAllTests.sh

Every test was passing but one:

Screenshot 2024-04-24 at 11 01 28 PM

After downloading vim into the container and modifying the file with some prints, I found that the issue was a GET returning a 404. Many trials later, I figured out that the issue was from the docker run command setting STUDY_CONFIG=stage_program, instead of stage-program. After running:
export STUDY_CONFIG=stage-program

All tests passed!

Screenshot 2024-04-24 at 10 08 20 PM

This is in alignment with the success of test-with-docker and ubuntu-only-test-with-manual-install, which essentially do the same thing.

To test the changes that we’ve made, I ran the exact same process as before (without the incorrect environment variable) using the most recently created image from our consolidate-differences image_build_push.yml:

$ docker run --name em-server-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program --network emission mukuflash03/e-mission-server:image-push-merge_2024-04-24--44-54

./runAllTests worked for this as well.

It is reassuring that the test-with-docker and ubuntu-only-test-with-manual-install runs in our consolidate-differences branch also passed.

@shankari
Copy link
Contributor

shankari commented Apr 25, 2024

@nataliejschultz this only tests previously created images. I don't see you building the images.
By default, image_build_push.yml runs on merges to master, so I am not sure how any previously created image will have any of the changes

@nataliejschultz
Copy link
Contributor

nataliejschultz commented Apr 25, 2024

By default, image_build_push.yml runs on merges to master, so I am not sure how any previously created image will have any of the changes

On the image-push-merge branch, we changed the workflows mentioned above to temporarily run on push to image-push-merge branch for testing. Since the actions are running in @MukuFlash03 's forked repository, shouldn't the changes be checked out?

This is also how the images with the changes have been pushed; running image_build_push.yml on push to image-push merge on the fork.

@nataliejschultz
Copy link
Contributor

Internal repo testing for Analysis:

Started mongo container from scratch:
$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0

Cd into copy of nrelopenpath multi-tier branch and build + run:

$ docker build -t int-analysis ./analysis/
$ docker run --name int-analysis-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program -e PUSH_PROVIDER="firebase" -e PUSH_SERVER_AUTH_TOKEN="Get from firebase console" -e PUSH_APP_PACKAGE_NAME="full package name from config.xml. e.g. edu.berkeley.eecs.emission or edu.berkeley.eecs.embase. Defaults to edu.berkeley.eecs.embase" -e PUSH_IOS_TOKEN_FORMAT="fcm" --network emission int-analysis

Checked to make sure that all of the cmd shell scripts made it into the correct directory, based on the new internal configuration:
Screenshot 2024-04-24 at 11 39 41 PM

Exec into container, start emission environment, and run ./runAllTests.sh:

Screenshot 2024-04-25 at 12 22 32 AM

Tests passed.

Mahadik, Mukul Chandrakant added 2 commits April 25, 2024 02:52
Removed test workflow execution.
Added upload artifact.
Commented image build and push for testing purposes.
Fixed indentation.
Learnt that YAML do not permit using tabs as indentation; spaces allowed.
@nataliejschultz
Copy link
Contributor

Of note: these are currently the only differences between consolidate-differences branch and image-push-merge:
Screenshot 2024-04-25 at 10 23 18 AM

None of the differences are significant to the functionality of the images.

@nataliejschultz
Copy link
Contributor

Just merged the changes from Mukul's tag automation work. Trying to get all of these PRs ready (removing conflicts, changing tags etc.) so that they're 100% ready for merge.

Getting the server changes ready for merge. I modified the yml jobs to run on master again, since this was switched off for testing.  This might trigger a cascade of events, and I think we all the other PRs are ready so we can see if their jobs are triggered too.
re-add run on gis-based-mode-detection branch
Trying to trigger image_build_push.yml and removing more push triggers on other branches.
@nataliejschultz
Copy link
Contributor

Okay, I think it is ready for review. All of the branches are prepped. However, I think we need to create a token to get the image build test to work, per this run I triggered saying:
"message": "Resource not accessible by personal access token",

- uses: actions/checkout@v4

- name: Trigger workflow in admin-dash, public-dash
# TODO: Create Fine-grained token with "Actions: write" permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankari Here is the token

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this, but there is no instruction on what the token should be called or where it should be configured. Fortunately, I can read code 😄

It is also not clear at what level this token needs to be set (does it need to be at the org level or can it be at the repo level)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mukul outlined the tokens he used in the issue here:

    • Used classic Personal access tokens for downloading artifacts.
    • Used fine-grained tokens for GitHub REST APIs.

and in more detail in this comment

I believe they are organization-level tokens, since they're working across repositories rather than just within the server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mukul just said "This required usage of a fine-grained token with the required access scope was actions: write". Not, "This required usage of a fine-grained token called XXX with the required access scope was actions: write on repo YYYY defined at XXX level"

That is what this comment is about. Instructions should be precise and we should not give permissions that we don't need. Concretely, the answer is not "org level for everything".

Regardless, as you can see from the issue, I figured it out and created the tokens. This is feedback for future PRs and instructions.

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nataliejschultz almost there. Let's get this right before we move on to the repos that depend upon it.

on:
push:
branches: [ master, gis-based-mode-detection ]

branches: [ master, gis-based-mode-detection, consolidate-differences ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nataliejschultz why do we have the consolidate-differences branch in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been fixed yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave it and continue to test in GH actions as the cleanup changes are made, but I removed it now!

sed "s_localhost_${local_host}_" conf/net/api/webserver.conf.sample > conf/net/api/webserver.conf
else
sed "s_localhost_${WEB_SERVER_HOST}_" conf/net/api/webserver.conf.sample > conf/net/api/webserver.conf
export WEB_SERVER_HOST=$local_host
echo "Setting webserver host environment variable to localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually remove this environment variable and hardcode it to 0.0.0.0
Especially now that we are focused on docker containers, and runs behind a reverse proxy, I have never seen this be anything other than 0.0.0.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to remove this and have the environment variable hardcoded inside the start_script.sh and/or dockerfile, while modifying the functionality of emission/net/api/config.py to match this change.

I built a local image after removing the use of webserver.conf. I thought the functionality would be preserved by the get_config_data_from_env function in config.py, but it actually completely broke the code.
After running the container with:

docker run --name em-server-noWH -d -e DB_HOST=db  -e STUDY_CONFIG=stage-program --network emission webserver-changes-2

I ran the tests, and they were going verrrrrrrry slowly (it took an hour and a half in total). Almost every test failed, with the resultant error message being:

pymongo.errors.ServerSelectionTimeoutError: 1234:27017: [Errno 65] No route to host, Timeout: 30s, Topology Description: <TopologyDescription id: 66466bf340a670086a00b405, topology_type: Unknown, servers: [<ServerDescription ('1234', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('1234:27017: [Errno 65] No route to host')>]>

Going to read through my changes and figure out what happened.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found out that webserver.conf.sample is currently necessary, and definitely wasn't replaced by the environment variables. Trying again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-built locally and ran tests. It worked with the changes in my most recent push.

Comment on lines 12 to 20
try:
key_file = open('conf/net/auth/secret_list.json')
except:
print("secret_list.json not configured, falling back to sample, default configuration")
key_file = open('conf/net/auth/secret_list.json.sample')
key_data = json.load(key_file)
key_file.close()
self.client_secret_list = key_data["client_secret_list"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a duplicate of lines 7-10. I am not sure why these need to be copied again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed lines 7-10

Comment on lines 8 to 10
local_host=`hostname -i`
sed "s_localhost_${local_host}_" conf/storage/db.conf.sample > conf/storage/db.conf
else
sed "s_localhost_${DB_HOST}_" conf/storage/db.conf.sample > conf/storage/db.conf
export DB_HOST=$local_host
echo "Setting db host environment variable to localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is supposed to work - the local_host variable is set if DB_HOST is not set but then in the else, you do export DB_HOST=$local_host. How is that supposed to work? $local_host is not the same as localhost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see here

Comment on lines 8 to 11
if [ -z ${DB_HOST} ] ; then
local_host=`hostname -i`
jq --arg db_host "$local_host" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
else
jq --arg db_host "$DB_HOST" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
export DB_HOST=$local_host
echo "Setting db host environment variable to localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. @nataliejschultz have you tried testing this with DB_HOST not set? I bet right now that it doesn't work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankari You're right, it does not work without DB_HOST set. I ran the server using my locally built image:

docker run --name em-server-no-host -d -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program --network emission consolidate-diff-server-localbuild

and before I could exec into the container, it gave me this error:

2024-05-13 15:28:05 pymongo.errors.ServerSelectionTimeoutError: 172.26.0.3:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 664285c63b13fd554aa58ca9, topology_type: Unknown, servers: [<ServerDescription ('172.26.0.3', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('172.26.0.3:27017: [Errno 111] Connection refused')>]>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankari I've tried several different ways to export the DB_HOST environment variable, but kept getting the connection refused error. I decided to try running the server container as it currently is in production by building a local image with an added echo.

docker build -t localhost-5 .
docker run --name em-server-no-host -d -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program --network emission localhost-5

The echo I added is in docker_start_script.sh, after it's supposedly set with the sample file:

echo ${DB_HOST}
if [ -z ${DB_HOST} ] ; then
    local_host=`hostname -i`
    jq --arg db_host "$local_host" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
else
    jq --arg db_host "$DB_HOST" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
fi
cat conf/storage/db.conf

echo "LOCALHOST!!:" ${DB_HOST}

Container log:

2024-05-14 12:55:08 {
2024-05-14 12:55:08   "timeseries": {
2024-05-14 12:55:08     "url": "172.26.0.3",
2024-05-14 12:55:08     "result_limit": 250000
2024-05-14 12:55:08   }
2024-05-14 12:55:08 }
2024-05-14 12:55:08 LOCALHOST!!:

I got the connection refused error again. Thoughts:

  • It isn't working in the first place
  • I'm misunderstanding what's happening
  • My db container is the problem

Going to try a new db container first and see if that's the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried new DB container:
docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0

Ran the exact same image as before, and got the same connection refused error.

Copy link
Contributor

@shankari shankari May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the challenge outlined in
#961 (comment)

The previous codebase assumed that if the DB_HOST was not set, the DB and the webapp were running on the same host. Obviously, if you try to test it with containers, it won't work.

However, with the new codebase, even if you run the DB and the webapp on the same container, it won't work because of #961 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the group meeting with @shankari, we clarified that:

  • The else was removed, so this code should work as expected
  • This method of setting the variable doesn't work when using containers, explaining the failures in my tests above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't work with containers, then do we even need this functionality in a file that only runs when the dockerfile is run?? It makes sense in start_script.sh, but maybe not here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I was hoping you would come to that. A lot of these decisions were made a long time ago, when our deployment profile was very different than what we do now. If you recall, part of this redesign was to simplify and unify these startup scripts.

You should ask yourself these questions, answer them and then keep only the parts that are relevant.

Even in start_script.sh, why do you need this, given that there is a fallback in the config file?
And if you do need it, why does it need to be hostname -i?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understand.
DB_HOST will never need to be set when running with containers

  • because it should be specified in the compose for testing and
  • containers run on their own IP(unless configured otherwise), so they're not going to connect using localhost anyway

start_script.sh runs all the tests, and the tests (and every server call in general) call get_database.py which gets the environment variable from emission.core.config. This has the fallback:

def get_config_data_from_env():
    config_data_env = {
        "timeseries": {
            "url": os.getenv('DB_HOST', "localhost"),

using hostname -i or localhost makes sense when not using containers, and the fallback to localhost should account for that. It seems like using localhost would be a rare case for someone running the server locally without containers.

conf/log/intake.conf.sample Outdated Show resolved Hide resolved
conf/analysis/debug.conf.internal.json Outdated Show resolved Hide resolved
auth_method = config_data["server"]["auth"]
aggregate_call_auth = config_data["server"]["aggregate_call_auth"]
not_found_redirect = config_data["paths"].get("404_redirect", OPENPATH_URL)
enac.reload_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we reload the config only here and nowhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STUDY_CONFIG environment variable is pulled directly above, so changing this value and reloading allows testers/users to set the variable to different configs in the container using export and test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement doesn't answer my question. I asked why the config is reloaded here and not anywhere else.

  • If we want testers/users to "set the variable to different configs in the container", why are we not concerned about people setting DB_HOST, for example.
  • I am not sure what you mean by "using export and test". The only way to restart a webapp in the container is to restart the container. At which point, the webapp will read the config again. Why do we need to reload here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the server with and without the reload. When I removed the reload, one of the tests failed: Screenshot 2024-05-23 at 11 11 49 PM This particular test calls the cfc_webapp.py. It appears that the reload is useful in this case for refreshing the value of "not_found_redirect"

emission/net/api/config.py Outdated Show resolved Hide resolved
emission/core/config.py Outdated Show resolved Hide resolved

def get_config_data():
try:
config_file = open('conf/storage/db.conf')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think we need this given that we are going to use the environment variable going forward and it falls back to localhost "url": os.getenv('DB_HOST', "localhost"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also remove the sample file to avoid confusion since we don't use it. Note that before this change, we used to fall back to the sample file if the db.conf file did not exist, but we have removed that now, so it is not clear what benefit the sample file gives us in terms of working out of the box or anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our logic here was that we wanted to maintain the functionality of db.conf for users running the server locally, in case they have it set in a certain way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with retaining this for now as a backwards compat hack but it should be removed in the next release.

Changes to set the webhost to 0.0.0.0 at all times
Removing the duplicate section
Reverting the logging level to WARNING
Restoring to WARNING logging level
Removing the functionality of check_unset_env_vars. DB_HOST should be caught in start_script.sh. When I rebuilt the image without this functionality and ran the container, all tests passed.
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done; just need to finish cleanup

Comment on lines 8 to 11
if [ -z ${DB_HOST} ] ; then
local_host=`hostname -i`
jq --arg db_host "$local_host" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
else
jq --arg db_host "$DB_HOST" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
export DB_HOST=$local_host
echo "Setting db host environment variable to localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I was hoping you would come to that. A lot of these decisions were made a long time ago, when our deployment profile was very different than what we do now. If you recall, part of this redesign was to simplify and unify these startup scripts.

You should ask yourself these questions, answer them and then keep only the parts that are relevant.

Even in start_script.sh, why do you need this, given that there is a fallback in the config file?
And if you do need it, why does it need to be hostname -i?

@@ -29,7 +29,7 @@ RUN bash -c "./.docker/setup_config.sh"

# #declare environment variables
ENV DB_HOST=''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, you could also set the default value of DB_HOST here instead of having the if check above

Comment on lines 10 to 14
print("In production environment, opening internal debug.conf")
config_file = open('conf/analysis/debug.conf.internal.json')
else:
print("analysis.debug.conf.json not configured, falling back to sample, default configuration")
config_file = open('conf/analysis/debug.conf.json.sample')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rename this to debug.conf.prod.json and debug.conf.dev.json to make this more clear.

Comment on lines 9 to 10
sed "s_localhost_${local_host}_" conf/storage/db.conf.sample > conf/storage/db.conf
else
sed "s_localhost_${DB_HOST}_" conf/storage/db.conf.sample > conf/storage/db.conf
export DB_HOST=$local_host
echo "Setting db host environment variable to localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when fixing the dockerized start script, we should fix this as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this environment variable set here anyway, since start_integration_tests.sh runs via a Dockerfile whose corresponding compose has the variable hardcoded.

auth_method = config_data["server"]["auth"]
aggregate_call_auth = config_data["server"]["aggregate_call_auth"]
not_found_redirect = config_data["paths"].get("404_redirect", OPENPATH_URL)
enac.reload_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement doesn't answer my question. I asked why the config is reloaded here and not anywhere else.

  • If we want testers/users to "set the variable to different configs in the container", why are we not concerned about people setting DB_HOST, for example.
  • I am not sure what you mean by "using export and test". The only way to restart a webapp in the container is to restart the container. At which point, the webapp will read the config again. Why do we need to reload here?

on:
push:
branches: [ master, gis-based-mode-detection ]

branches: [ master, gis-based-mode-detection, consolidate-differences ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been fixed yet.

Comment on lines +47 to +52
- name: Upload Artifact
uses: actions/upload-artifact@v4
with:
name: docker-image-tag
path: tag_file.txt
overwrite: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the tag_file.txt to be uploaded here given that we are passing it directly to the workflows on line 77?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Mukul's comment in the issue:

Why I chose to add artifact method as well?

The issue I was facing was with fetching the latest timestamp for the image tag in case of a push event trigger. This is because in the workflow dispatch, the server workflow itself would trigger the workflows and hence was in a way connected to these workflows. However, push events would only trigger the specific workflow in that specific dashboard repository to build and push the image and hence would not be able to retrieve the image tag directly.

So, I utilized the artifact upload and download method to:

  • upload the image timestamp as an artifact in the workflow run for future use.
  • download the uploaded artifact from the latest previously successful and completed workflow run in e-mission-server repo for a specific branch (currently set to tags-combo-approach but to be changed to master once changes are final).

Removing consolidate differences branch
Removing DB_HOST if-else functionality, as it's not necessary for the container.
Setting  DB_HOST=db as the default for running in a docker container
Removing unnecessary echo
Updating to reflect the changed names of the debug.conf.json files
My machine is acting up and not allowing me to rename this file. Going to do it manually in github, but need to make an inconsequential change first.
Removing the inconsequential change from the last push and renaming manually. This was my issue: desktop/desktop#13588
Thought I would quickly change the name of debug.conf.json.sample to debug.conf.dev.json; this was not inconsequential. Seeing if this fixes it.
DB_HOST is hardcoded in the compose file that runs this script anyway.
reverting to DB_HOST='' since setting it to db probably isnt the best idea.

Also removing the DB_HOST fallback in start_script.sh since it'll be caught in config.py
Had to revert some changes but need to test the functionality of docker compose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review
3 participants