Kubernetes deployment #2353

carlobeltrame · 2021-12-14T17:30:13Z

This PR adds a helm chart which allows to install ecamp3 on any Kubernetes, even multiple times. It also adds the ability for feature branch deployments.

Fixes #2283, closes #1883

How to do a feature branch deployment: To deploy a feature branch, first, do a code review, because deploying gives the code in the PR access to secrets such as login credentials to the Kubernetes cluster, and consequently all secrets and data of all environments. Once you have reviewed the code for malicious changes, simply set the deploy! label on the PR, and it will be deployed within the next hour.

About helm: Helm is the package manager of Kubernetes, similar to apt-get, composer or npm. A chart (package in helm) basically contains a bunch of Kubernetes resource definitions in YAML files. All these YAML files can be templates, i.e. parts of them can be dynamically filled with configuration (e.g. the API domain, or the Sentry DSN, or the Docker image tag that should be deployed). So instead of our deploy.sh script, we use these templates to fill in environment variables etc.. All configurable values are listed in values.yaml with their defaults.

About Kubernetes resources: "Resources" in kubernetes means everything that runs or lives on the cluster. All resources can be described in YAML files. Examples of resources:

a deployment describes a pod (which runs a single container in most cases) and associated metadata. E.g. which docker image should be deployed, which environment variables are defined in the pod / container, how many replicas of the pod should be running, how should the liveness and readiness of the pod be checked, how are new versions of the pods rolled out etc.. This is more or less the replacement for the docker-compose.yml we had before, but a lot fancier and more automatic.
a service allows to expose and reference ports of a pod in the cluster-internal network
an ingress allows to expose a service on the internet. This is the replacement for the nginx reverse proxy we had before, and is indeed realized internally using an nginx.
config maps and secrets contain settings and secrets for the pods, and can be mounted into a container or set as environment variables or files inside a container. This simplifies the assembly of e.g. the environment.js or .env files that our deploy.sh previously had to create.

I have some local documentation written down which describes how to deploy and so on. I need to upload that somewhere at some point.

TODO:

Loading fixtures in production would require making alice a non-dev dependency. And either way, prod fixtures are better covered using database migrations: https://stackoverflow.com/a/47902192

Finishes what was started in commit 4c7d667

At least as long as we are running the container as root. These lines were made obsolete when we started running the containers as non-root in development. Maybe once we do the same in production, we'll need another way to set the permissions correctly.

BacLuc · 2021-12-15T14:38:43Z

Kommt sich "Switch to digitalocean managed Postgres" nicht mit feature deployment in die Quere?
Oder würden wir bei feature deployments eine datenbank erstellen?

carlobeltrame · 2021-12-15T14:52:29Z

Kommt sich "Switch to digitalocean managed Postgres" nicht mit feature deployment in die Quere? Oder würden wir bei feature deployments eine datenbank erstellen?

Ich hatte jetzt vorgehabt, die Datenbank während dem Deployment automatisch zu erstellen, da es dann weniger kostet weil wir pauschal für das managed Postgres zahlen (im Gegensatz zur grösseren nötigen Cluster-Grösse wenn wir für jeden Feature Branch einen zusätzlichen Postgres-Container deployen). Aber man kann auch pro Deployment einstellen, ob das managed Postgres oder ein extra-Container mit eigenem Postgres verwendet werden soll. Ist eine reine Konfigurationsfrage beim Helm Chart von API Platform: https://github.com/carlobeltrame/ecamp3/blob/devel/.helm/ecamp3/values.yaml#L75..L81
Somit konnte ich diesen Punkt heute auch abhaken, ohne Änderungen am Code zu machen. Ich musste nur die Argumente die ich Helm auf der Kommandozeile übergebe abändern.

…tically This is a step towards feature branch deployments. Default is to create the database automatically but not drop it automatically, just in case someone does not know about this feature and wants to quickly re-install ecamp3 on their cluster.

This is better than running it in the entrypoint script because when there are long-running migrations, it's possible that the pod running the migrations exceeds its liveness probe limit and is killed before finishing the migrations. https://itnext.io/database-migrations-on-kubernetes-using-helm-hooks-fb80c0d97805

… container is ready

carlobeltrame · 2022-01-04T17:15:05Z

@ecamp/core Kubernetes and feature branch deployments are officially ready. The open TODOs are possible future improvements. It might be beneficial for discussing any concerns if you want to have a look before the meeting.

I have removed the GitHub Actions for the old deployment and for separately building and pushing the docker images, in favor of a single workflow file "continuous-deployment.yml". That file checks which PRs have a deploy! label and calculates which branches need to be newly deployed, upgraded or uninstalled. An example run of this workflow can be seen here in my fork: https://github.com/carlobeltrame/ecamp3/actions/runs/1654671254

The deployment is done using helm, which I have described a little above. Anyone who wants to deploy eCamp v3 to a Kubernetes cluster can download our helm chart and use the helm CLI to deploy it, even multiple times on the same cluster. I have written up some documentation in the wiki on how to do that.

All deployments use our managed postgres instance, and a database is created and migrated automatically when installing a new deployment, and the database is deleted automatically when uninstalling a deployment. The helm chart also supports running a postgres db in a container.

usu

Yeah, very nice 😻 Thanks a lot!

A few comments (and mainly questions) below and in the code. But generally looks good to me.

Improvement ideas

Any possibility to indicate on the PR, after the deployment is successful (incl. URL to frontend?) (e.g. using https://github.com/marketplace/actions/comment-pull-request). Or the information is not visible on the PR because the actions are not yet running on our repository?
If I understand correctly, the workaround with checking every 30min is due to limitation in accessing secrets from PR branches. Thinkable that at a later stage we switch to branch-push for dev (and maybe later also for stage and prod)?

Questions

Stealing secrets: Theoretically, someone could add commits after the "deploy!" tag has been given and therefore include malicious code. Right?

.github/workflows/continuous-deployment.yml

usu · 2022-01-05T09:59:53Z

api/docker/php/migrate-database.sh

+fi
+
+if ls -A migrations/*.php >/dev/null 2>&1; then
+  php bin/console doctrine:migrations:migrate --no-interaction


If something goes wrong during migration, would that be visible somewhere in the logs?

The migrations are run in a separate pod (i.e. a php container is spun up only to run the migrations) every time we install or upgrade a release. This pod is configured in .helm/ecamp3/templates/hook_db_migrate.yaml. If that pod runs into an error, it will remain in the cluster. So in there, we'd be able to see the logs.
If the pod runs smoothly, it's removed from the cluster afterwards. We could choose to leave it there, but then our cluster would slowly fill up with these finished pods from old feature branch deployments... I think when uninstalling the feature branch deployment, Helm cannot automatically remove these old pods that were created by a hook. We'd have to add another command in continuous-deployment.yml after helm delete to accomplish that, and it could get forgotten if we ever manually delete a feature branch deployment.

api/docker/php/docker-entrypoint.sh

.helm/ecamp3/templates/hook_db_drop.yaml

.helm/ecamp3/templates/print_deployment.yaml

carlobeltrame · 2022-01-05T11:42:17Z

Any possibility to indicate on the PR, after the deployment is successful (incl. URL to frontend?) (e.g. using https://github.com/marketplace/actions/comment-pull-request). Or the information is not visible on the PR because the actions are not yet running on our repository?

I think that should sort itself once the PR is merged. Have a look on my fork, where the active GitHub deployments (Environments) are visible (even though they are kind of wrong): https://github.com/carlobeltrame/ecamp3
I expect that once the workflow runs on ecamp/ecamp3, the deployments will show up correctly, and also the PRs should lose their "This branch has not been deployed" comment. If not, I'd propose to debug that then, once we can actually test out things on the origin.

If I understand correctly, the workaround with checking every 30min is due to limitation in accessing secrets from PR branches. Thinkable that at a later stage we switch to branch-push for dev (and maybe later also for stage and prod)?

It is thinkable. Having a single large workflow that looks at all open labeled PRs will still be necessary, because the PRs from forks cannot trigger workflow runs with secrets (I mean they can, but the secrets aren't available). And due to the syncing mechanism, this large workflow also needs to know about dev or any other special deployments that should remain even without a labeled PR.
We could just add a trigger like

on:
  push:
    branches:
      - devel

to this large workflow, but I doubt we'd get a lot of speedup, because the whole workflow takes ~10-15mins due to the docker build cache not working correctly.
Another option would be to create a separate workflow for deploying special branches, which is only run on branch push. But still, the large feature branch syncing workflow would need to know about all these special deployments, so it doesn't uninstall them. (Or we could install dev, stage, prod into different namespaces on kubernetes, which might be a good solution). But for this first version, I wanted to avoid the yaml code duplication that this would imply.

Stealing secrets: Theoretically, someone could add commits after the "deploy!" tag has been given and therefore include malicious code. Right?

That is indeed a valid attack vector. I guess we need to be extra careful when deploying a PR from an unknown collaborator.

usu · 2022-01-05T13:13:53Z

That is indeed a valid attack vector. I guess we need to be extra careful when deploying a PR from an unknown collaborator.

Ok, I see. So it makes extra sense to not share any secrets between PR/dev environment and stage/prod environment, so in worst case, an attacker would only have access to dev secrets.

usu · 2022-01-05T15:51:49Z

Are both https://pr2353.ecamp3.ch/ and https://dev.ecamp3.ch/ supposed to work? When trying to login with test-user I receive a 401 response (invalid credentials) on pr2353 and a 500 response on dev.

.helm/ecamp3/templates/print_secrets.yaml

.github/workflows/continuous-deployment.yml

usu · 2022-01-07T14:48:37Z

Caching of the docker builds somehow doesn't work as documented...

I tried to dig into this one a bit. If I understood correctly, every build image (api, caddy, frontend, etc.) needs its own cache. Otherwise they invalidate each other.

It's not super well documented, but I think the scope property can be used for this. See the buildkit documentation and this example from an issue discusison in the build-push-action repo.

Another option might be to use the other described solution with actions/cache workflow and local cache. But also here, separate caches are needed for each build image. See a corresponding discussion and example here.

carlobeltrame · 2022-01-07T15:16:34Z

Are both https://pr2353.ecamp3.ch/ and https://dev.ecamp3.ch/ supposed to work? When trying to login with test-user I receive a 401 response (invalid credentials) on pr2353 and a 500 response on dev.

The fixtures are deployed on neither, so the test-user will never work (unless you register it manually). I was able to register and log in normally with a new account on pr2353.
On dev, the database was in a broken state, probably a leftover from before I made sure to only deploy the correct version. Some migrations had been applied there which weren't present in the deployed version of the code, one of which was the one from #2241. I fixed it by manually deleting the dev environment and dropping the database, and letting GitHub re-deploy it.

…deployments

carlobeltrame · 2022-01-07T16:49:31Z

It's not super well documented, but I think the scope property can be used for this. See the buildkit documentation and this example from an issue discusison in the build-push-action repo.

Setting the scope was it! Previously, all workflow runs that actually did something took around 15min, now workflow runs with unchanged docker images take less than 2 minutes.

BacLuc

Meega cool. Hoffentlich hattest du noch bisschen festtage neben dem kubernetes deployment.

manuelmeister

So cool!

usu · 2022-01-18T14:20:04Z

Hab übrigens noch einen Bericht gefunden von einer, die auf die gleiche Problematik mit Pull-request + secrets gestossen ist, aber mit einem etwas anderen Ansatz gelöst hat. Für den Fall, dass wir mit den cron-jobs nicht warm werden:
https://blog.jupyter.org/how-i-automated-authorised-cloud-deployments-from-pull-requests-with-github-actions-13f890538e32
https://github.com/sgibson91/test-this-pr-action

Inspiriert von:
https://github.com/imjohnbo/ok-to-test

carlobeltrame added 21 commits December 14, 2021 16:18

Re-add the helm chart that comes with api platform

667220f

Adapt metadata in helm chart to ecamp3

028b447

Remove references to PWA from helm chart

0293e5e

Ignore local sub-charts

7ab8b2c

Don't attempt to load alice fixtures in production for now

33dfda5

Loading fixtures in production would require making alice a non-dev dependency. And either way, prod fixtures are better covered using database migrations: https://stackoverflow.com/a/47902192

Really switch api to the 3000 port range

b6c438e

Finishes what was started in commit 4c7d667

Remove backwards compatibility for kubernetes < 1.19

40f79c7

Get API network connection to work and clean up

eb8942c

Use envFrom to simplify API deployment spec

438086b

Prepare for adding frontend and other components

e366eee

Allow to set the image tag for all containers at once

b6a4d3e

Add default paths for easier configuration

58dcb19

Add environment and secrets necessary for API

6f57987

Add example.com as trusted host in tests

f7f5316

Rename helpers as preparation for adding more services

0696b9e

More adaptations and cleanup before adding more services

37a24d8

More cleanup before adding more services

48f2fc8

Add frontend to helm chart

07aeb9e

Update post-deployment notes

0299795

Add JWT signing key pair to API

d8c2c5a

carlobeltrame added 7 commits December 15, 2021 19:39

Fix helm lint issues

39718c3

There is no reason to wait for two minutes before checking if the php…

d488b2f

… container is ready

Repeatedly check the database connection before running migrations

101b940

Add nuxt print service to helm chart

7f13ab4

Remove obsolete static file server configuration

d0911b6

carlobeltrame added the deploy! Creates a feature branch deployment for this PR label Jan 4, 2022

carlobeltrame requested review from BacLuc and usu January 4, 2022 16:50

usu reviewed Jan 5, 2022

View reviewed changes

.helm/ecamp3/templates/print_deployment.yaml Show resolved Hide resolved

usu reviewed Jan 5, 2022

View reviewed changes

.helm/ecamp3/templates/print_secrets.yaml Show resolved Hide resolved

usu reviewed Jan 5, 2022

View reviewed changes

.github/workflows/continuous-deployment.yml Show resolved Hide resolved

carlobeltrame force-pushed the devel branch 3 times, most recently from a486592 to 4564f8c Compare January 7, 2022 14:31

carlobeltrame added 5 commits January 7, 2022 16:59

Make sure the postgres URL is censored in the logs

8f5e642

Make sure to never automatically uninstall or drop the db of special …

2553d04

…deployments

Be more lenient with slow print response times for now

8b1ebca

Tolerate missing print sentry DSN

8361905

Fix docker build cache

599c752

carlobeltrame force-pushed the devel branch from 10183fe to 599c752 Compare January 7, 2022 16:00

usu approved these changes Jan 7, 2022

View reviewed changes

BacLuc approved these changes Jan 8, 2022

View reviewed changes

manuelmeister approved these changes Jan 10, 2022

View reviewed changes

carlobeltrame merged commit e5694c2 into ecamp:devel Jan 10, 2022

carlobeltrame mentioned this pull request Jan 17, 2022

Use GitHub Environments to protect the secrets of deployments #709

Closed

carlobeltrame mentioned this pull request Feb 3, 2022

Migration zu API Platform #1579

Closed

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes deployment #2353

Kubernetes deployment #2353

carlobeltrame commented Dec 14, 2021 •

edited

BacLuc commented Dec 15, 2021

carlobeltrame commented Dec 15, 2021

carlobeltrame commented Jan 4, 2022 •

edited

usu left a comment

usu Jan 5, 2022

carlobeltrame Jan 5, 2022 •

edited

carlobeltrame commented Jan 5, 2022

usu commented Jan 5, 2022

usu commented Jan 5, 2022

usu commented Jan 7, 2022

carlobeltrame commented Jan 7, 2022

carlobeltrame commented Jan 7, 2022

BacLuc left a comment

manuelmeister left a comment

usu commented Jan 18, 2022

Kubernetes deployment #2353

Kubernetes deployment #2353

Conversation

carlobeltrame commented Dec 14, 2021 • edited

BacLuc commented Dec 15, 2021

carlobeltrame commented Dec 15, 2021

carlobeltrame commented Jan 4, 2022 • edited

usu left a comment

Choose a reason for hiding this comment

usu Jan 5, 2022

Choose a reason for hiding this comment

carlobeltrame Jan 5, 2022 • edited

Choose a reason for hiding this comment

carlobeltrame commented Jan 5, 2022

usu commented Jan 5, 2022

usu commented Jan 5, 2022

usu commented Jan 7, 2022

carlobeltrame commented Jan 7, 2022

carlobeltrame commented Jan 7, 2022

BacLuc left a comment

Choose a reason for hiding this comment

manuelmeister left a comment

Choose a reason for hiding this comment

usu commented Jan 18, 2022

carlobeltrame commented Dec 14, 2021 •

edited

carlobeltrame commented Jan 4, 2022 •

edited

carlobeltrame Jan 5, 2022 •

edited