-
Notifications
You must be signed in to change notification settings - Fork 15
All resources are deleted on etcd leader loss #143
Comments
I enabled more detailed logging on the kube-controller-manager.
(GitTrack is in namespace faros-system while it searches in platform-system) |
This is a fairly major architectural problem and I'm sorry we haven't worked this out sooner. There will need to be some fairly major changes to Faros if we want to fix this. There are two approaches that I can see going forward, though I'm not sure which approach is likely to be best, I've been thinking about these for the last couple of days and I think they would both work but please do pick holes Approach 1 - Namespaced and cluster separateWith this approach, we would create a new CRD called a We would then also change the This approach would then mean that you can use a GitTrack and ClusterGitTrack point them at the same folder and each would ignore the namespaced or non-namespaced resources respectively. We would then also have to disable the cross namespace mode in Faros and insist that you run 1 instance per namespace plus 1 additional instance to manage cluster scoped resources. Approach 2 - ProxiesThis approach would use two similar CRD GTO and CGTO owner references would point to the The GT controller would then need to add a finalizer to the GT so that upon deletion, it can delete all of the This approach would allow us to retain the 1 instance per cluster model. |
Thanks for looking into this. I'll add my thoughts, @wimdec might have more. Approach 1:
Approach 2:
What are your thoughts on making |
One problem with simply making Currently RBAC allows restricting users to creating and editing If they were cluster-scoped it would be quite easy to accidentally break payloads in namespaces owned by other teams simply by applying or editing an incorrect |
@sebastianroesch Just going to add my responses to your comments
We currently have a lot of cluster scoped resources in folders with namespaced files too, so yeah, probably will also need a bit of a restructure, or to just live with it ignoring these files
It was always our intention that a GitTrack should own resources in the same namespace as itself, so much so, I'd consider it a bug that I can create a GitTrack in
I'm with @gargath on this, I think this really reduces the usability of Faros for self service multi tenant clusters |
I would propose the following:
This allows running a global Faros instance for cluster operator using ClusterGitTrack. And every team can run Faros in own namespace without security risk. The global Faros instance will no longer watch the namespaced GitTrack resources so that global and namespaced Faros instances do not conflict. Of course, it is also possible to add a flag to watch for this: only ClusterGitTrack or both ClusterGitTrack and GitTrack. So that users can choose. These changes are of course not backwards compatible. So thought needs to be given on upgrade path. One option is to give the new restricted namespaced GitTrack a new apiVersion: v1alpha2. v1alpha2 then means that the GitTrack is will ignore all resources not in same namespace. |
We have a proposal for fixing this which the Pusher team are going to start implementing during this week. The Garbage collector has been historically broken which allowed this problem to go unnoticed for a large part and only manifest itself in certain scenarios. I believe this has recently been fixed and when trying to upgrade to 1.15.3, our test cluster kinda exploded, so this is now very very urgent to fix for us |
We had a couple of incident that when the etcd leader node is restated that the Kubernetes garbage collector is deleting all Faros managed resources.
After analysis, root cause is probably the following:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents
https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L311
Currently GitTrack is namespace-scoped. This means that all ClusterGitTrackObject and GitTrackObject in other namespace than GitTrack have an illegal ownerreference currently.
To solve this, GitTrack should become cluster-scoped.
Details:
Trigger:
After some time, following logs will appear in the kube-controller-manager:
Only GitTrackObject in same namespace as GitTrack are not deleted.
The text was updated successfully, but these errors were encountered: