Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Webhook installation is not idempotent #525

Closed
anekdoti opened this issue Feb 8, 2023 · 1 comment · Fixed by #538
Closed

[bug]: Webhook installation is not idempotent #525

anekdoti opened this issue Feb 8, 2023 · 1 comment · Fixed by #538
Labels
bug Something isn't working

Comments

@anekdoti
Copy link

anekdoti commented Feb 8, 2023

Describe the bug

Webhooks are typically installed in the init container of the operator pod. It is possible, that the webhook installation fails, e.g., due to a connection loss to the Kubernetes API. In such cases, the init container is restarted until it succeeds.

However, if the ValidatingWebhookConfiguration or MutatingWebhookConfiguration are already created, the init container throws an exception (see below) hinting to a conflict with an already existing resource in the cluster.

This is due to that IMHO the method KubernetesClient.Save (https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps.KubernetesClient/KubernetesClient.cs#L116) is not implemented correctly: for the decision whether to create or update a resource in the cluster, it checks whether the uid of the resource given as the argument to the method is null. In the webhook installation (and similar places in the framework), the resource given to the Save method are always freshly created, and therefore the uid is always null - independently of the possibility, that the resource already might exist in the cluster. A proper implementation of the Save method would check the existence of the resource in the cluster instead.

Another option would be to use the same pattern as for the service for the WebhookConfigurations, i.e., delete the already existing resource in the cluster before. I am not sure whether there is a reason why Save was used instead.

To reproduce

  1. Deploy a KubeOps operator and ensure that the webhook installation fails (e.g., by creating a MutatingWebhookConfiguration of the right name beforehand).
  2. Remove the obstacle
  3. Observe that further webhook installation attempts will fail (since the ValidatingWebhookConfiguration already got created)

Expected behavior

The webhook installation should succeed eventually.

Screenshots

The exception thrown by the webhook installer:

Create validator definition.
k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict'
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateClusterCustomObjectWithHttpMessagesAsync(Object body, String group, String version, String plural, String dryRun, String fieldManager, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.GenericClient.CreateAsync[T](T obj, CancellationToken cancel)
   at KubeOps.KubernetesClient.KubernetesClient.Create[TResource](TResource resource)
   at KubeOps.Operator.Commands.Management.Webhooks.Install.OnExecuteAsync(CommandLineApplication app)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /build/Controller/Program.cs:line 105
Unhandled exception. k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict'
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateClusterCustomObjectWithHttpMessagesAsync(Object body, String group, String version, String plural, String dryRun, String fieldManager, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.GenericClient.CreateAsync[T](T obj, CancellationToken cancel)
   at KubeOps.KubernetesClient.KubernetesClient.Create[TResource](TResource resource)
   at KubeOps.Operator.Commands.Management.Webhooks.Install.OnExecuteAsync(CommandLineApplication app)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /build/Controller/Program.cs:line 105
   at Program.<Main>(String[] args)

Additional Context

Kubernetes: v1.23
KubeOps: 7.0.6

@anekdoti anekdoti added the bug Something isn't working label Feb 8, 2023
@goncalo-oliveira
Copy link

I think I'm hitting this as well...

$ kubectl logs my-operator-8657dd45c6-f5xsp
Defaulted container "operator" out of: operator, webhook-installer (init)
Error from server (BadRequest): container "operator" in pod "my-operator-8657dd45c6-f5xsp" is waiting to start: PodInitializing
$ kubectl logs my-operator-8657dd45c6-f5xsp -c webhook-installer
info: ApplicationStartup[0]
      Registered validation webhook.
Download cfssl / cfssljson for linux.
Make unix binaries executable.
Generating server certificate.
2023/02/14 16:54:26 [INFO] generate received request
2023/02/14 16:54:26 [INFO] received CSR
2023/02/14 16:54:26 [INFO] generating key: ecdsa-256
2023/02/14 16:54:27 [INFO] encoded CSR
2023/02/14 16:54:27 [INFO] signed certificate
Files in /certs:
/certs/server-key.pem
/certs/ca.pem
/certs/server.csr
/certs/server.pem
Create service.
Create validator definition.
Unhandled exception. k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict'
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateClusterCustomObjectWithHttpMessagesAsync(Object body, String group, String version, String plural, String dryRun, String fieldManager, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.GenericClient.CreateAsync[T](T obj, CancellationToken cancel)
   at KubeOps.KubernetesClient.KubernetesClient.Create[TResource](TResource resource)
   at KubeOps.Operator.Commands.Management.Webhooks.Install.OnExecuteAsync(CommandLineApplication app)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /operator/Program.cs:line 34
   at Program.<Main>(String[] args)

Even after clearing the resources with the command below, I'm unable to reinstall the operator.

$ kubectl delete -k src/config/install

The packages versions I'm using...

    <PackageReference Include="KubeOps" Version="7.0.7" />
    <PackageReference Include="KubeOps.KubernetesClient" Version="7.0.7" />
    <PackageReference Include="KubernetesClient" Version="10.0.31" />

Running on minikube

$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.26.1
Kustomize Version: v4.5.7
Server Version: v1.26.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants