Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(charts/authentik): add probes to worker deployment #255

Merged
merged 2 commits into from Apr 24, 2024

Conversation

channel-42
Copy link
Contributor

This PR adds probes (liveness, readiness and startup) to the authentik-worker deployment.

Currently, even if a worker pod gets unhealthy (e.g. due to a restart of redis as described here goauthentik/authentik#6221), it does not get restarted by the kubelet due to the missing liveness probe and certain tasks (like adding new providers) do not work, until the affected pod is restarted manually.  

With the proposed changes, in case a worker finds itself in an unhealthy state (returned by ak healthcheck), the kubelet restarts the worker pod and the new healthy worker reconnects to the server pod.

@channel-42 channel-42 requested a review from a team as a code owner March 31, 2024 20:38
@BeryJu
Copy link
Member

BeryJu commented Mar 31, 2024

I think there was a reason why we don't do this by default anymore, iirc for the worker container, the ak healthcheck has to load the full django app, which uses a lot of ram

@channel-42
Copy link
Contributor Author

I ran the ak healthcheck command continuously and monitored the memory usage while doing so. No significant increase. Looking through the sources, I found the healthcheck command for the worker, which is just a os.Stat() on the heartbeat file.

@MayurDuduka
Copy link

+1

@rissson rissson requested a review from BeryJu April 19, 2024 16:39
Copy link
Member

@BeryJu BeryJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally its inadvisable to have the liveness and readiness probe be the same, but alas we don't have a better probe for celery yet

@channel-42
Copy link
Contributor Author

The pipeline fails at ct install with client rate limiter Wait returned an error: context deadline exceeded due to the timeout set in .github/configs/ct-install.yaml.

The chart definitely installs fine on my local cluster. Would it be viable to increase the pipeline timeout?

@BeryJu BeryJu merged commit 5052c4f into goauthentik:main Apr 24, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants