Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Gateway Prometheus Virtual Host metrics random assigned wrong with multiple listeners #3305

Open
owenhaynes opened this issue Apr 29, 2024 · 1 comment
Labels
help wanted Extra attention is needed
Milestone

Comments

@owenhaynes
Copy link
Contributor

Description:
Prometheus Virtual Host metrics are being assigned wrongly having cases where the wrong host is being attached to the wrong vhost.

I am not sure what the correct behaviour is in some cases we just get metrics like envoy_vhost_vcluster_upstream_rq_retry and in the current case we get vhost.<virtual host name>.vcluster.<virtual cluster name> which is what the envoy docs say but with the wrong virtual host name attached to the wrong virtual host cluster.

A. The envoy_vhost_vcluster_upstream_rq_retry metrics that get emitted do have labels for envoy_virtual_cluster and envoy_virtual_host

B. vhost.<virtual host name>.vcluster.<virtual cluster name> only have the envoy_virtual_host label attached and no cluster label.

I hope A is the correct way as its easier to build dashboards from.

Repro steps:

Take a example :

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: foo
spec:
  gatewayClassName: merge-gateway
  listeners:
  - name: foo.com
    protocol: HTTP
    hostname: foo.com
    port: 80
    allowedRoutes:
      namespaces:
        from: Same
  - name: bar.com
    protocol: HTTP
    hostname: bar.com
    port: 80
    allowedRoutes:
      namespaces:
        from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: foobar-combined
spec:
  parentRefs:
    - name: foo
      group: gateway.networking.k8s.io
      kind: Gateway 
  hostnames:
  - foo.com
  - foobar.com
  rules:
  - backendRefs:
    - kind: Service
      group: ''
      name:  my-svc
      port: 80
      weight: 1
    matches:  
    - path:
        type: PathPrefix
        value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: a
spec:
  gatewayClassName: merge-gateway
  listeners:
  - name: a.foo.com
    protocol: HTTP
    hostname: foo.com
    port: 80
    allowedRoutes:
      namespaces:
        from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: a
spec:
  parentRefs:
    - name: foo
      group: gateway.networking.k8s.io
      kind: Gateway 
  hostnames:
  - a.foo.com
  rules:
  - backendRefs:
    - kind: Service
      group: ''
      name:  my-svc
      port: 80
      weight: 1
    matches:  
    - path:
        type: PathPrefix
        value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ab
spec:
  gatewayClassName: merge-gateway
  listeners:
  - name: ab.foo.com
    protocol: HTTP
    hostname: ab.foo.com
    port: 80
    allowedRoutes:
      namespaces:
        from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: foobar-combined
spec:
  parentRefs:
    - name: ab
      group: gateway.networking.k8s.io
      kind: Gateway 
  hostnames:
  - ab.foo.com
  rules:
  - backendRefs:
    - kind: Service
      group: ''
      name:  my-svc
      port: 80
      weight: 1
    matches:  
    - path:
        type: PathPrefix
        value: /

Its hard to pinpoint what's going on and seems to be random in how it picks for example I got envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry{envoy_virtual_host="api-gateway/ab/ab"} 0
for the metric name.

Metrics dump

# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_limit_exceeded counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_limit_exceeded{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_overflow counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_overflow{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_success counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_retry_success{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_timeout counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_timeout{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_total counter
envoy_vhost_foo_com_ab_foo_com_vcluster_ab_foo_com_upstream_rq_total{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_limit_exceeded counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_limit_exceeded{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_overflow counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_overflow{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_success counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_retry_success{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_timeout counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_timeout{envoy_virtual_host="api-gateway/ab/ab"} 0
# TYPE envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_total counter
envoy_vhost_foo_com_ab_foo_com_vcluster_other_upstream_rq_total{envoy_virtual_host="api-gateway/ab/ab"} 0

it looks like its merging the virtual clusters.

Environment:
k8s 1.29.0
envoy gateway 1.0.1 Merge gateways

@owenhaynes
Copy link
Contributor Author

owenhaynes commented Apr 29, 2024

I have also seen that resource names for gateways resource name which use "." are also causing issues like above

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: foo.foo

from the http://localhost:19000/stats
vhost.my-ns/foo.foo-public
I assume the same is for HTTPRoute names as well, not looked but I assume they need to be escaped like we do for hostnames?

@arkodg arkodg added this to the v1.1.0-rc1 milestone May 23, 2024
@arkodg arkodg added help wanted Extra attention is needed and removed triage labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants