Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image-Automation-Controller doesn't correctly set status of removed resources to "deleted" in exposed metrics #501

Open
1 task done
mimparat132 opened this issue Mar 23, 2023 · 1 comment

Comments

@mimparat132
Copy link

mimparat132 commented Mar 23, 2023

Describe the bug

Our team removed all resources for a project called platform-toy. The imageupdateautomation object and the associated gitrepository objects were removed from the cluster and gitrepository flux is connected to.

Upon removing all resources related to the project we began receiving alerts that the imageupdateautomation object was not able to reconcile despite the object not existing.

Further investigation revealed that the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
exposed by the image-automation-controller was set to true and the metric:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} "
was set to false after the object was removed from the cluster.

Our alerting is configured to alert us if:
"gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}"
is set to 1

It seems that after an imageupdateautomation object is removed from the cluster, the image-automation-controller does not correctly identify that the imageupdateautomation object has been deleted and does not correctly update its metrics.

Upon restarting the image-automation-controller, the platform-toy imageupdateautomation object is not seen by the controller anymore and the alerting stops since the metric is no longer advertised.

Steps to reproduce

  1. Create the target namespace: platform-toy
  2. Deploy an imageupdateautomation object with name platform-toy
  3. Delete the imageupdateautomation object and namespace from the cluster
  4. Check the "/metrics" endpoint of the image-automation-controller to see what metrics are being exposed
  5. The image-automation-controller will expose the metric:
    "gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"}" 1

Expected behavior

The image-automation-controller should expose the following metric:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 1
Or the image-automation-controller should remove the resource from it's metrics endpoint entirely.

Screenshots and recordings

The following is the console output before the imageupdateautomation object is removed from the cluster:
$ kubetl get imageupdateautomation -n platform-toy
NAME LAST RUN
platform-toy

The metrics exposed by the image-automation-controller are the following:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0

The following console output is after the resources have been deleted:
$ kubectl get imageupdateautomation -n platform-toy
No resources found in platform-toy namespace.

The metrics exposed by the image-automation-controller are the following after deleting the resource:
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Deleted",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="False",type="Ready"} 1
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="ImageUpdateAutomation",name="platform-toy",namespace="platform-toy",status="Unknown",type="Ready"} 0

OS / Distro

VMware Photon OS/Linux

Flux version

flux version 0.38.2

Flux check

N/A

Git provider

gitlab

Container Registry provider

artifactory

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@makkes
Copy link
Member

makkes commented Mar 28, 2023

This should be resolved by #364

@makkes makkes transferred this issue from fluxcd/flux2 Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants