Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appuio_cloud_memory query fails on on unrelated node label changes #104

Closed
bastjan opened this issue Dec 2, 2022 · 3 comments · Fixed by #111
Closed

appuio_cloud_memory query fails on on unrelated node label changes #104

bastjan opened this issue Dec 2, 2022 · 3 comments · Fixed by #111
Labels
bug Something isn't working

Comments

@bastjan
Copy link
Contributor

bastjan commented Dec 2, 2022

Description

There's a short overlap of kube_node_labels series, which crashes the queries with many-to-many errors. If there is a label change. Since it takes 5 minutes for prometheus to mark a series as non-active. We tried fixing it for query related labels in #99 but this does not work on unrelated label changes.

Screenshot 2022-12-02 at 16 32 37

Additional Context

No response

Logs

❯ oc -n appuio-cloud-reporting  --as=cluster-admin debug job/backfill-appuio-cloud-memory-27831795 -- appuio-cloud-reporting report --query-name=appuio_cloud_memory --begin 2022-12-01T12:00:00Z
Starting pod/backfill-appuio-cloud-memory-27831795-debug, command was: sh -c appuio-cloud-reporting report --begin=$(date -d "now -3 hours" -u +"%Y-%m-%dT%H:00:00Z") --repeat-until=$(date -u -Iseconds) --query-name=appuio_cloud_memory
2022-12-02T15:31:05.227Z | INFO | appuio-cloud-reporting | appuio-cloud-reporting/logger.go:40 | Starting up appuio-cloud-reporting | {"version": "0.7.0", "date": "2022-12-02", "commit": "f312391a75667650860045401cfb59c7fa1a6585", "go_os": "linux", "go_arch": "amd64", "go_version": "go1.19.1", "uid": 65536, "gid": 0}
2022-12-02T15:31:05.238Z | INFO | appuio-cloud-reporting | appuio-cloud-reporting/report_command.go:122 | Running report...
2022-12-02T15:31:06.849Z | ERROR | appuio-cloud-reporting | v2@v2.19.2/app.go:618 | fatal error | {"error": "failed to run query 'appuio_cloud_memory' at '2022-12-01T12:00:00Z': failed to query prometheus: execution: found duplicate series for the match group {node=\"flex-7b9a\"} on the right hand-side of the operation: [{__name__=\"kube_node_labels\", cluster_id=\"c-appuio-cloudscale-lpg-2\", container=\"kube-rbac-proxy-main\", endpoint=\"https-main\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_os=\"linux\", label_csi_cloudscale_ch_zone=\"lpg1\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"flex-7b9a\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", namespace=\"openshift-monitoring\", node=\"flex-7b9a\", prometheus=\"openshift-monitoring/k8s\", receive=\"true\", service=\"kube-state-metrics\", tenant_id=\"c-appuio-cloudscale-lpg-2\"}, {__name__=\"kube_node_labels\", cluster_id=\"c-appuio-cloudscale-lpg-2\", container=\"kube-rbac-proxy-main\", endpoint=\"https-main\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_os=\"linux\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"flex-7b9a\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", namespace=\"openshift-monitoring\", node=\"flex-7b9a\", prometheus=\"openshift-monitoring/k8s\", receive=\"true\", service=\"kube-state-metrics\", tenant_id=\"c-appuio-cloudscale-lpg-2\"}];many-to-many matching not allowed: matching labels must be unique on one side"}

Expected Behavior

Working appuio_cloud_memory query.

Steps To Reproduce

No response

Versions

unrelated to version

@bastjan bastjan added the bug Something isn't working label Dec 2, 2022
@bastjan
Copy link
Contributor Author

bastjan commented Dec 2, 2022

I tried using min by which does work on it's own but timeouts in the full query #105.

@bastjan
Copy link
Contributor Author

bastjan commented Dec 6, 2022

While on it we can also fix kube_persistentvolume_info duplicates after platform upgrades:
Screenshot 2022-12-06 at 15 06 03

@bastjan
Copy link
Contributor Author

bastjan commented Dec 21, 2022

Memory query fixed in #105.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant