Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumenting Prometheus metrics on an ECS service with multiple instances of task definition #63

Open
Ma11hewThomas opened this issue Jun 14, 2022 · 0 comments

Comments

@Ma11hewThomas
Copy link

Ma11hewThomas commented Jun 14, 2022

Hello,

I'm working on a prototype to instrument application metrics to Prometheus for an application hosted in ECS with an application load balanced fargate service. The AWS OTEL collector is running as a sidecar container - as below:

image

This is working well with a single task, however, with multiple instantiations of the task definition, I'm unable to distinguish which instance the metric has been pulled from, resulting in inaccurate data.

Each metric has an instance label, however, it's the same value across all instances, 0.0.0.0:8080, which is the scrape target.

As a result, each collector is writing the same metric plus labels to AWS Managed Prometheus and I have found no way to distinguish.

I tried tried the following config -
resource_to_telemetry_conversion:
enabled: true

Which added labels service_name and service_instance_id, however the values are not unique - aws-otel-app (job_name) and 0.0.0.0:8080 (scrape_target) for all metrics.

I can see the target metric has the following ECS related labels:
aws_ecs_cluster_name, aws_ecs_launchtype, aws_ecs_service_name, aws_ecs_task_arn, aws_ecs_task_family, aws_ecs_task_id, aws_ecs_task_known_status, aws_ecs_task_launch_type, aws_ecs_task_pull_started_at, aws_ecs_task_pull_stopped_at, aws_ecs_task_revision.

Applying the aws_ecs_task_id label to all other metrics would be useful but I have been unable to do this successfully.

I appreciate any help or guidance,
Thanks,
Matt

adot-config -

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 30s
        scrape_timeout: 10s
      scrape_configs:
        - job_name: "aws-otel-app"
          honor_labels: true
          static_configs:
            - targets: ["0.0.0.0:8080"]
             
  awsecscontainermetrics:
    collection_interval: 10s
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:55681
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
  memory_limiter:
    limit_mib: 100
    check_interval: 5s
exporters:
  awsprometheusremotewrite:
    endpoint: https://aps-workspaces.eu-west-2.amazonaws.com/workspaces/ws-{workspace-id}/api/v1/remote_write
    aws_auth:
      region: {region}
      service: aps
    resource_to_telemetry_conversion:
      enabled: true
  logging:
    loglevel: info
  awsxray:
    region: {region}
    index_all_attributes: true
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, awsprometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, awsprometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [awsxray]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant