-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes deployment #2353
Kubernetes deployment #2353
Conversation
Loading fixtures in production would require making alice a non-dev dependency. And either way, prod fixtures are better covered using database migrations: https://stackoverflow.com/a/47902192
Finishes what was started in commit 4c7d667
At least as long as we are running the container as root. These lines were made obsolete when we started running the containers as non-root in development. Maybe once we do the same in production, we'll need another way to set the permissions correctly.
Kommt sich "Switch to digitalocean managed Postgres" nicht mit feature deployment in die Quere? |
Ich hatte jetzt vorgehabt, die Datenbank während dem Deployment automatisch zu erstellen, da es dann weniger kostet weil wir pauschal für das managed Postgres zahlen (im Gegensatz zur grösseren nötigen Cluster-Grösse wenn wir für jeden Feature Branch einen zusätzlichen Postgres-Container deployen). Aber man kann auch pro Deployment einstellen, ob das managed Postgres oder ein extra-Container mit eigenem Postgres verwendet werden soll. Ist eine reine Konfigurationsfrage beim Helm Chart von API Platform: https://github.com/carlobeltrame/ecamp3/blob/devel/.helm/ecamp3/values.yaml#L75..L81 |
…tically This is a step towards feature branch deployments. Default is to create the database automatically but not drop it automatically, just in case someone does not know about this feature and wants to quickly re-install ecamp3 on their cluster.
This is better than running it in the entrypoint script because when there are long-running migrations, it's possible that the pod running the migrations exceeds its liveness probe limit and is killed before finishing the migrations. https://itnext.io/database-migrations-on-kubernetes-using-helm-hooks-fb80c0d97805
… container is ready
@ecamp/core Kubernetes and feature branch deployments are officially ready. The open TODOs are possible future improvements. It might be beneficial for discussing any concerns if you want to have a look before the meeting. I have removed the GitHub Actions for the old deployment and for separately building and pushing the docker images, in favor of a single workflow file "continuous-deployment.yml". That file checks which PRs have a The deployment is done using helm, which I have described a little above. Anyone who wants to deploy eCamp v3 to a Kubernetes cluster can download our helm chart and use the helm CLI to deploy it, even multiple times on the same cluster. I have written up some documentation in the wiki on how to do that. All deployments use our managed postgres instance, and a database is created and migrated automatically when installing a new deployment, and the database is deleted automatically when uninstalling a deployment. The helm chart also supports running a postgres db in a container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, very nice 😻 Thanks a lot!
A few comments (and mainly questions) below and in the code. But generally looks good to me.
Improvement ideas
- Any possibility to indicate on the PR, after the deployment is successful (incl. URL to frontend?) (e.g. using https://github.com/marketplace/actions/comment-pull-request). Or the information is not visible on the PR because the actions are not yet running on our repository?
- If I understand correctly, the workaround with checking every 30min is due to limitation in accessing secrets from PR branches. Thinkable that at a later stage we switch to branch-push for
dev
(and maybe later also forstage
andprod
)?
Questions
- Stealing secrets: Theoretically, someone could add commits after the "deploy!" tag has been given and therefore include malicious code. Right?
fi | ||
|
||
if ls -A migrations/*.php >/dev/null 2>&1; then | ||
php bin/console doctrine:migrations:migrate --no-interaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If something goes wrong during migration, would that be visible somewhere in the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migrations are run in a separate pod (i.e. a php container is spun up only to run the migrations) every time we install or upgrade a release. This pod is configured in .helm/ecamp3/templates/hook_db_migrate.yaml. If that pod runs into an error, it will remain in the cluster. So in there, we'd be able to see the logs.
If the pod runs smoothly, it's removed from the cluster afterwards. We could choose to leave it there, but then our cluster would slowly fill up with these finished pods from old feature branch deployments... I think when uninstalling the feature branch deployment, Helm cannot automatically remove these old pods that were created by a hook. We'd have to add another command in continuous-deployment.yml after helm delete
to accomplish that, and it could get forgotten if we ever manually delete a feature branch deployment.
I think that should sort itself once the PR is merged. Have a look on my fork, where the active GitHub deployments (Environments) are visible (even though they are kind of wrong): https://github.com/carlobeltrame/ecamp3
It is thinkable. Having a single large workflow that looks at all open labeled PRs will still be necessary, because the PRs from forks cannot trigger workflow runs with secrets (I mean they can, but the secrets aren't available). And due to the syncing mechanism, this large workflow also needs to know about on:
push:
branches:
- devel to this large workflow, but I doubt we'd get a lot of speedup, because the whole workflow takes ~10-15mins due to the docker build cache not working correctly.
That is indeed a valid attack vector. I guess we need to be extra careful when deploying a PR from an unknown collaborator. |
Ok, I see. So it makes extra sense to not share any secrets between PR/dev environment and stage/prod environment, so in worst case, an attacker would only have access to dev secrets. |
Are both https://pr2353.ecamp3.ch/ and https://dev.ecamp3.ch/ supposed to work? When trying to login with |
a486592
to
4564f8c
Compare
I tried to dig into this one a bit. If I understood correctly, every build image (api, caddy, frontend, etc.) needs its own cache. Otherwise they invalidate each other. It's not super well documented, but I think the Another option might be to use the other described solution with |
The fixtures are deployed on neither, so the test-user will never work (unless you register it manually). I was able to register and log in normally with a new account on pr2353. |
Setting the scope was it! Previously, all workflow runs that actually did something took around 15min, now workflow runs with unchanged docker images take less than 2 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meega cool. Hoffentlich hattest du noch bisschen festtage neben dem kubernetes deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cool!
Hab übrigens noch einen Bericht gefunden von einer, die auf die gleiche Problematik mit Pull-request + secrets gestossen ist, aber mit einem etwas anderen Ansatz gelöst hat. Für den Fall, dass wir mit den cron-jobs nicht warm werden: Inspiriert von: |
This PR adds a helm chart which allows to install ecamp3 on any Kubernetes, even multiple times. It also adds the ability for feature branch deployments.
Fixes #2283, closes #1883
How to do a feature branch deployment: To deploy a feature branch, first, do a code review, because deploying gives the code in the PR access to secrets such as login credentials to the Kubernetes cluster, and consequently all secrets and data of all environments. Once you have reviewed the code for malicious changes, simply set the
deploy!
label on the PR, and it will be deployed within the next hour.About helm: Helm is the package manager of Kubernetes, similar to apt-get, composer or npm. A chart (package in helm) basically contains a bunch of Kubernetes resource definitions in YAML files. All these YAML files can be templates, i.e. parts of them can be dynamically filled with configuration (e.g. the API domain, or the Sentry DSN, or the Docker image tag that should be deployed). So instead of our deploy.sh script, we use these templates to fill in environment variables etc.. All configurable values are listed in values.yaml with their defaults.
About Kubernetes resources: "Resources" in kubernetes means everything that runs or lives on the cluster. All resources can be described in YAML files. Examples of resources:
I have some local documentation written down which describes how to deploy and so on. I need to upload that somewhere at some point.
TODO:
kubectl
client to the cluster__main__
namespace or something? -> solved, had to re-add a line in the Dockerfile that was previously removed in a clean-up during the setup of API platformfilesrabbitmqworker puppeteerThe GitHub Action could be triggered by adding a label on a PR, or (re-)opening a PR with the labelThere is no way on GitHub to grant PRs from forks access to repository secrets, so we cannot directly trigger the deployment actions from events on the PR itself.deploy!
label and all currently active deployments, and installs, upgrades and uninstalls any deployments that need it.deploy!
labelCaching of the docker builds somehow doesn't work as documented...works now!feature-branch
alongside e.g.pr2343
as separate environments, even though thepr2353
deployment just happens to use thefeature-branch
environment. Not sure we can do anything about that...