Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADOT Collector/instrumentation not creating X-Ray spans on ECS Fargate, NodeJS app #946

Open
AA-morganh opened this issue Feb 1, 2024 · 1 comment

Comments

@AA-morganh
Copy link

AA-morganh commented Feb 1, 2024

Hi, I'm having an issue with ADOT on ECS Fargate. I'm seeing cloudwatch logs, metrics, and container insights metrics, as well as some start-up FS spans in X-RAY, but I'm not getting any application spans in X-Ray. My auto instrumentation code is as folows:

/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-proto';
import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { AWSXRayPropagator } from '@opentelemetry/propagator-aws-xray';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { AWSXRayIdGenerator } from '@opentelemetry/id-generator-aws-xray';

if (!process.env.DISABLE_TELEMETRY) {
  // For troubleshooting, set the log level to DiagLogLevel.DEBUG
  diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);

  const traceExporter = process.env.OTLP_COLLECTOR_TRACE_URL
    ? new OTLPTraceExporter({
        url: process.env.OTLP_COLLECTOR_TRACE_URL,
      })
    : new OTLPTraceExporter({ url: 'http://127.0.0.1:4318/v1/traces' });

  const metricReader = new PeriodicExportingMetricReader({
    exporter: process.env.OTLP_COLLECTOR_METRICS_URL
      ? new OTLPMetricExporter({
          url: process.env.OTLP_COLLECTOR_METRICS_URL,
        })
      : new OTLPMetricExporter({ url: 'http://127.0.0.1:4318/v1/metrics' }),
  });

  const spanProcessor = new BatchSpanProcessor(traceExporter);

  const sdk = new NodeSDK({
    textMapPropagator: new AWSXRayPropagator(),
    traceExporter: traceExporter,
    metricReader: metricReader,
    spanProcessor: spanProcessor,
    idGenerator: new AWSXRayIdGenerator(),
    instrumentations: [getNodeAutoInstrumentations()],
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: 'MyService',
      [SemanticResourceAttributes.SERVICE_VERSION]: '1.0',
    }),
  });

  sdk.start();

  process.on('SIGTERM', () => {
    sdk
      .shutdown()
      .then(() => console.log('Tracing and Metrics terminated'))
      .catch((error) => console.log('Error terminating tracing and metrics', error))
      .finally(() => process.exit(0));
  });
}

export default {};

My TaskDef looks like this (my CI replaces a bunch of tokens in here):

{
  "family": "myService",
  "containerDefinitions": [
    {
      "name": "myService",
      "image": "REPLACE_REPOSITORY_URI:REPLACE_IMAGE_TAG",
      "healthCheck": {
            "command": ["CMD-SHELL", "wget -q -S -O - localhost:8080/healthcheck"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "portMappings": [
        {
            "containerPort": 8080,
            "hostPort": 8080,
            "protocol": "tcp"
        }
      ],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-myService",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      },
      "dependsOn": [{
        "containerName": "aws-otel-collector",
        "condition": "HEALTHY"
      }],
      "environment": [
                {
                  "name": "ACCOUNT_ID",
                  "value": "REPLACE_AWS_ACCOUNT_ID"
                },
                {
                  "name": "REGION",
                  "value": "REPLACE_AWS_REGION"
                },
                {
                  "name": "STAGE",
                  "value": "REPLACE_STAGE"
                },
                {
                  "name": "NO_COLOR",
                  "value": "NO_COLOR"
                },
                {
                  "name": "LatestSchema",
                  "value": "REPLACE_LATEST_SCHEMA"
                },
                {
                  "name": "JWT_SECRET",
                  "value": "REPLACE_SECRET_ARN"
                }
        ]
    },
    {
      "name": "aws-otel-collector",
      "image": "REPLACE_AWS_ACCOUNT_ID.dkr.ecr.REPLACE_AWS_REGION.amazonaws.com/ecr-public/aws-observability/aws-otel-collector:latest",
      "essential": true,
      "command": [
                "--set=service.telemetry.logs.level=DEBUG", "--config=/etc/ecs/container-insights/otel-task-metrics-config.yaml"
      ],
      "user": "0:0",
      "healthCheck": {
            "command": ["/healthcheck"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-aws-otel-sidecar-collector",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      }
    },
    {
      "name": "aws-otel-emitter",
      "image": "REPLACE_AWS_ACCOUNT_ID.dkr.ecr.REPLACE_AWS_REGION.amazonaws.com/ecr-public/aws-otel-test/aws-otel-goxray-sample-app:latest",
      "essential": false,
      "healthCheck": {
            "command": ["CMD-SHELL", "curl -f http://localhost:5000 || exit 1"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-aws-otel-sidecar-emitter",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      }
    }
  ],
  "taskRoleArn": "REPLACE_TASK_ROLE_ARN",
  "executionRoleArn": "REPLACE_EXECUTION_ROLE_ARN",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

I have verbose logging enabled on both the sdk and on the collector, and I'm not seeing anything that looks suspicious to me, other than that I don't see my expected automatic or manual spans.

On a local docker-compose setup with a simple mainline otel collector I do see my spans making it to a grafana/tempo instance, so I think the instrumentation is largely set up correctly. Any guidance would be a huge help.

@bmxpiku
Copy link

bmxpiku commented Mar 25, 2024

I think we're facing the same issue right now, something changed and broke collector in some minor version update for node

EDIT:
This is due to us moving forward and migrating whole project to ESM, downgrade to 18.16 and using experimental flag fixed it

# https://gajus.com/blog/how-to-add-sentry-tracing-to-your-node-js-app#nodejs-esm-modules
# https://github.com/open-telemetry/opentelemetry-js/issues/4392
# https://github.com/open-telemetry/opentelemetry-js/issues/4547
# https://github.com/open-telemetry/opentelemetry-js/issues/4553
CMD ["node", "--experimental-loader=@opentelemetry/instrumentation/hook.mjs", "dist/index.js"]

Though I talked with my team and we wont do it for production build, so I think we gotta live with no x-ray for a time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants