Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodMonitor on deployment does not add scrape targets #2895

Open
thefirstofthe300 opened this issue Apr 24, 2024 · 4 comments
Open

PodMonitor on deployment does not add scrape targets #2895

thefirstofthe300 opened this issue Apr 24, 2024 · 4 comments
Labels
area:target-allocator Issues for target-allocator bug Something isn't working

Comments

@thefirstofthe300
Copy link

Component(s)

collector, target allocator

What happened?

Description

When using the target allocator with a pod monitor that targets pods in a deployment, I am not seeing any metrics be exported. Looking at the /scrape_configs, I see the scape_config is created, but the associated job has no targets.

Kubernetes Version

1.28.3

Operator version

0.97.1

Collector version

0.97.0

Environment information

No response

Log output

{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Starting the Target Allocator"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Waiting for caches to sync for namespace"}
{"level":"info","ts":"2024-04-23T21:59:15Z","logger":"allocator","msg":"Starting server..."}
{"level":"info","ts":"2024-04-23T21:59:15Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Caches are synced for namespace"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Waiting for caches to sync for podmonitors"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Caches are synced for podmonitors"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Waiting for caches to sync for servicemonitors"}
{"level":"info","ts":"2024-04-23T21:59:15Z","msg":"Caches are synced for servicemonitors"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:18Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:19Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:19Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:19Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:20Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:22Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:23Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:56Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:56Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:59Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:59Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:59Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T21:59:59Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:01Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:01Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:01Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:01Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Node name is missing from the spec. Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:00:04Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:15:04Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":"2024-04-23T22:30:04Z","logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}

Additional context

PodMonitor

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: cluster-autoscaler
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: clusterapi-cluster-autoscaler
  podMetricsEndpoints:
    - path: /metrics
      port: "8085"
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_label_app_kubernetes_io_instance]
          action: replace
          targetLabel: __autoscaler_target

Scrape config

curl -s http://opentelemetry-targetallocator/scrape_configs | jq
{
  ...<snip>...,  
  "podMonitor/clusters-system/cluster-autoscaler/0": {
    "enable_compression": true,
    "enable_http2": true,
    "follow_redirects": true,
    "honor_timestamps": true,
    "job_name": "podMonitor/clusters-system/cluster-autoscaler/0",
    "kubernetes_sd_configs": [
      {
        "enable_http2": true,
        "follow_redirects": true,
        "kubeconfig_file": "",
        "namespaces": {
          "names": [
            "clusters-system"
          ],
          "own_namespace": false
        },
        "role": "pod"
      }
    ],
    "metrics_path": "/metrics",
    "relabel_configs": [
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "job"
        ],
        "target_label": "__tmp_prometheus_job_name"
      },
      {
        "action": "drop",
        "regex": "(Failed|Succeeded)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_phase"
        ]
      },
      {
        "action": "keep",
        "regex": "(clusterapi-cluster-autoscaler);true",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_label_app_kubernetes_io_name",
          "__meta_kubernetes_pod_labelpresent_app_kubernetes_io_name"
        ]
      },
      {
        "action": "keep",
        "regex": "8085",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_container_port_name"
        ]
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_namespace"
        ],
        "target_label": "namespace"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_container_name"
        ],
        "target_label": "container"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_name"
        ],
        "target_label": "pod"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "clusters-system/cluster-autoscaler",
        "separator": ";",
        "target_label": "job"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "8085",
        "separator": ";",
        "target_label": "endpoint"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_label_app_kubernetes_io_instance"
        ],
        "target_label": "__autoscaler_target"
      },
      {
        "action": "hashmod",
        "modulus": 1,
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__address__"
        ],
        "target_label": "__tmp_hash"
      },
      {
        "action": "keep",
        "regex": "$(SHARD)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__tmp_hash"
        ]
      }
    ],
    "scheme": "http",
    "scrape_interval": "30s",
    "scrape_protocols": [
      "OpenMetricsText1.0.0",
      "OpenMetricsText0.0.1",
      "PrometheusText0.0.4"
    ],
    "scrape_timeout": "10s",
    "track_timestamps_staleness": false
  },
  ...<snip>...
}

Job targets

curl -s http://opentelemetry-targetallocator/jobs | jq
{
  "podMonitor/aws-cni-system/aws-node/0": {
    "_link": "/jobs/podMonitor%2Faws-cni-system%2Faws-node%2F0/targets"
  },
  "podMonitor/observability-system/opentelemetry-collector/0": {
    "_link": "/jobs/podMonitor%2Fobservability-system%2Fopentelemetry-collector%2F0/targets"
  }
}
@thefirstofthe300 thefirstofthe300 added bug Something isn't working needs triage labels Apr 24, 2024
@thefirstofthe300
Copy link
Author

I have attempted to accomplish the same goal using a ServiceMonitor but I'm running into #2891.

@jaronoff97 jaronoff97 added area:target-allocator Issues for target-allocator and removed needs triage labels Apr 24, 2024
@thefirstofthe300
Copy link
Author

I'm seeing the same issue with a service monitor now, this time with the control plane etcd (I am running the OTEL collector on the control plane nodes):

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd
  namespace: observability-system
spec:
  selector:
    matchLabels:
      component: etcd
      tier: control-plane
  endpoints:
    - port: "2381"
  namespaceSelector:
    any: true

@thefirstofthe300
Copy link
Author

If I restart the targetallocator pod, the logs for Node name is missing from the spec don't reappear, which makes me think those logs may be related to the time after a pod has been created but before it is scheduled.

@thefirstofthe300
Copy link
Author

I probably confused this ticket with the last two posts. They're not appropriate for this bug so ignore them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants