8.0.0-beta.6 massive performance degradation/connections dropped #11897

gajus · 2024-05-04T23:45:16Z

Is there an existing issue for this?

I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
I have reviewed the documentation https://docs.sentry.io/
I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

8.0.0-beta.6

Framework Version

No response

Link to Sentry event

No response

SDK Setup

N/A

Steps to Reproduce

We upgraded from 8.0.0-alpha.9 to 8.0.0-beta.6.

Expected Result

Everything to work.

Actual Result

Experienced major performance degradation.

There are no errors per se. However, service performance has dropped to a halt.

The most noticeable symptom was occasional errors coming from pg module complaining about timeouts, e.g.

Error: timeout expired
    at Timeout._onTimeout (/srv/node_modules/.pnpm/pg@8.11.5/node_modules/pg/lib/client.js:106:28)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)

Worth noticing that we are also heavily relying on Zod. I noticed that it was mentioned in the upgrade changelog.

The text was updated successfully, but these errors were encountered:

mydea · 2024-05-06T07:28:01Z

Hey, thank you for reporting this!

So zodIntegration has been added but is not enabled by default, so that's probably not related to the issue you are describing.

We did bump a bunch of underlying instrumentation, including @opentelemetry/instrumentation-pg 🤔 But I don't really see anything in there that would explain why this would become slower. Just to clarify, you only bumped @sentry/node but no other packages?

One thing you could try is to force it to use v0.38.0 of @opentelemetry/instrumentation-pg, which is what we used before. If you are using yarn, you can do this like this in your package.json

{
  "resolutions": {
    "@sentry/node/@opentelemetry/instrumentation-pg": "^0.38.0"
  }
}

I think this should work to try if that has an impact on performance 🤔 But it could of course also be another instrumentation, hard to tell. I guess you don't have a public reproduction for this that we could take a look at?

gajus · 2024-05-09T01:08:03Z

Same issue with rc2 just in case. Our app goes to a complete halt after trying to deploy it. Average response time increases from 150ms to over 1500ms.

What's weird is that the CPU usage remains about the same. I would suspect that some instrumentation logic is introducing big overhead, but that does not appear to be the case.

I have not had much luck replicating this in isolation, i.e. we are only seeing this when deployed to production, which would indicate that this issue surfaces under high concurrency.

Will continue to troubleshoot and report back if we can locate the root cause.

mydea · 2024-05-13T08:40:58Z

Thanks for the added info!

Could you share your Sentry.init() code, and also maybe your app setup - e.g. what framework (express, fastify, ...) are you using, what middleware, other relevant config, ...?

gajus · 2024-05-15T19:13:55Z

Could you share your Sentry.init() code, and also maybe your app setup - e.g. what framework (express, fastify, ...) are you using, what middleware, other relevant config, ...?

import { RequestError } from '@contra/http-client';
import { init, rewriteFramesIntegration } from '@sentry/node';
import { nodeProfilingIntegration } from '@sentry/profiling-node';
import { type Integration } from '@sentry/types';

export { findActiveSpan } from './findActiveSpan.js';
export { getTraceAndBaggage } from './getTraceAndBaggage.js';
export { startSpan } from '@sentry/core';
export {
  type Breadcrumb,
  captureException,
  captureMessage,
  close,
  setupFastifyErrorHandler,
  setUser,
  withScope,
} from '@sentry/node';
export { type Extras } from '@sentry/types';

const {
  NODE_ENV,
  NODE_PROFILING,
  // eslint-disable-next-line n/no-process-env
} = process.env;

type ErrorDetail = {
  message: string;
  path: string[];
};

// Type Guard for ErrorDetail
const isErrorDetail = (error: unknown): error is ErrorDetail => {
  return (
    typeof error === 'object' &&
    error !== null &&
    'message' in error &&
    'path' in error &&
    typeof error.message === 'string' &&
    Array.isArray(error.path)
  );
};

// Extracts error message from JSON error string
const extractErrorMessage = (errorString: string): string | null => {
  try {
    const outerArray = JSON.parse(errorString);
    const parsed = outerArray[0];

    if (Array.isArray(parsed) && parsed.every(isErrorDetail)) {
      return parsed.map((error) => error.message).join(', ');
    } else {
      return parsed.message;
    }
  } catch {
    return null;
  }
};

// Extracts path from JSON error string and removes index of edges
const extractErrorPath = (errorString: string): string | null => {
  try {
    const outerArray = JSON.parse(errorString);
    const parsed = outerArray[0];

    if (Array.isArray(parsed) && parsed.every(isErrorDetail)) {
      return parsed
        .map((error) => error.path.join('.').replaceAll(/\d+/gu, ''))
        .filter(Boolean)
        .join(', ');
    } else {
      return parsed.path.join('.').replaceAll(/\d+/gu, '');
    }
  } catch {
    return null;
  }
};

// Extracts error details
const extractErrorDetails = (error: unknown): Record<string, string | null> => {
  let errorMessage: string | null = null;
  let errorPath: string | null = null;

  errorMessage = extractErrorMessage(typeof error === 'string' ? error : '');
  errorPath = extractErrorPath(typeof error === 'string' ? error : '');

  return { errorMessage, errorPath };
};

const formatFingerprint = (fingerprint: string[]): string[] => {
  return fingerprint.map((item) => {
    const { errorMessage, errorPath } = extractErrorDetails(item);
    if (errorMessage && errorPath) {
      return `{message: "${errorMessage}", path: "${errorPath}"}`;
    }

    return item;
  });
};

export type SentryTracing = {
  shutdown: () => Promise<void>;
};

export const createSentry = ({
  integrations,
  root,
}: {
  integrations?: (integrations: Integration[]) => Integration[];
  root: string;
}) => {
  init({
    beforeSend(event, hint) {
      let fingerprint =
        typeof hint.originalException === 'string'
          ? [hint.originalException]
          : hint.originalException instanceof Error
            ? [hint.originalException?.message]
            : undefined;

      if (!event.contexts) {
        event.contexts = {};
      }

      if (hint.originalException instanceof RequestError) {
        event.contexts.httpRequest = {
          method: hint.originalException.request?.options.method,
          // eslint-disable-next-line @typescript-eslint/no-explicit-any
          source: (hint.originalException as any)?.sourceTrace,
          url: hint.originalException.request?.requestUrl,
        };
        event.contexts.httpResponse = {
          body: hint.originalException?.response?.body ?? null,
          statusCode: hint.originalException?.response?.statusCode,
        };
        fingerprint = [
          hint.originalException.message,
          hint.originalException.request?.options.method ?? '',
          // @ts-expect-error TODO why origin does not exist?
          String(hint.originalException.request?.options.url?.origin),
        ];
      }

      if (fingerprint) {
        if (event.contexts.graphql) {
          fingerprint.push(event.contexts.graphql.path as string);
        } else if (event.contexts.temporal) {
          fingerprint.push(event.contexts.temporal.workflowType as string);
        }
      }

      if (fingerprint) {
        event.fingerprint = formatFingerprint(fingerprint);
      } else {
        event.fingerprint = fingerprint;
      }

      return event;
    },
    integrations: (defaultIntegrations) => {
      const customIntegrations = [
        ...defaultIntegrations,
        rewriteFramesIntegration({
          root,
        }),
      ];

      if (NODE_PROFILING === 'true') {
        customIntegrations.push(nodeProfilingIntegration());
      }

      if (integrations) {
        return integrations(customIntegrations);
      }

      return customIntegrations;
    },
    normalizeDepth: 5,
    spotlight: NODE_ENV === 'development',
    tracesSampleRate: 1,
  });

  return {
    shutdown: async () => {
      // This was made redundant by Sentry abstracting away opentelemetry
    },
  };
};

gajus · 2024-05-15T19:53:47Z

v8 same issue.

@mydea I managed to get a trace while we were testing v8.

Looks like there are massive gaps between span.

https://contra.sentry.io/performance/trace/7564ad7a9f280bad7145d336820752d8/?fov=0%2C20355.000019073486

gajus · 2024-05-15T19:59:57Z

The resource consumption profile does not change dramatically.

This is a snapshot of pre-deployment:

NAME                                                   CPU(cores)   MEMORY(bytes)
contra-api-684b67464f-6kdbx                            324m         1080Mi
contra-api-684b67464f-7w99p                            269m         1092Mi
contra-api-684b67464f-9dc9d                            121m         1018Mi
contra-api-684b67464f-9lrk9                            181m         1068Mi
contra-api-684b67464f-9m9sv                            198m         1238Mi
contra-api-684b67464f-drmtj                            205m         1263Mi
contra-api-684b67464f-gnxm7                            135m         1302Mi
contra-api-684b67464f-hz4wl                            236m         1226Mi
contra-api-684b67464f-jctc6                            140m         1352Mi
contra-api-684b67464f-jvscg                            117m         923Mi
contra-api-684b67464f-phnms                            193m         1048Mi
contra-api-684b67464f-q85h5                            172m         1003Mi
contra-api-684b67464f-q87h5                            88m          904Mi
contra-api-684b67464f-s6qwt                            115m         1277Mi
contra-api-684b67464f-x6rvs                            290m         1410Mi

This is after:

contra-api-75df777468-6qt2c                            119m         1524Mi
contra-api-75df777468-7sw7t                            119m         1143Mi
contra-api-75df777468-8nsmg                            253m         1310Mi
contra-api-75df777468-9r4x4                            221m         1462Mi
contra-api-75df777468-cgs9s                            227m         1535Mi
contra-api-75df777468-ctv2h                            195m         1216Mi
contra-api-75df777468-fh7d2                            230m         1323Mi
contra-api-75df777468-fkwcd                            261m         1585Mi
contra-api-75df777468-kf8vm                            288m         1390Mi
contra-api-75df777468-n9tlz                            250m         1451Mi
contra-api-75df777468-pc78z                            397m         1554Mi
contra-api-75df777468-rtvkz                            336m         1529Mi
contra-api-75df777468-v6t8l                            201m         1444Mi
contra-api-75df777468-wz984                            274m         1250Mi
contra-api-75df777468-xtmlm                            67m          1111Mi

Memory usage is def higher on average, but nothing that would make me wince.

gajus · 2024-05-15T20:00:33Z

Just like the last time, we do see a big uptick in timeouts while trying to establish pg connection.

Logs are flooded with:

Error: timeout expired
    at Timeout._onTimeout (/srv/node_modules/.pnpm/pg@8.11.5/node_modules/pg/lib/client.js:106:28)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)

gajus · 2024-05-15T20:40:29Z

@mydea What would be the correct way to disable all auto instrumentations and re-add them all one by one?

I know how to do it with opentelemetry, but not when it is hidden like it is with Sentry.

export OTEL_NODE_ENABLED_INSTRUMENTATIONS=""

doesn't work.

mydea · 2024-05-16T07:50:37Z

@mydea What would be the correct way to disable all auto instrumentations and re-add them all one by one?

I know how to do it with opentelemetry, but not when it is hidden like it is with Sentry.
export OTEL_NODE_ENABLED_INSTRUMENTATIONS=""
doesn't work.

You can do defaultIntegrations: false (see https://docs.sentry.io/platforms/javascript/guides/node/configuration/integrations/#modifying-default-integrations), and then add single integrations back.

Or, alternatively, you can also just filter out single integrations like this:

Sentry.init({
  integrations: function(integrations) {
    return integrations.filter(integration => integration.name !== 'Postgres');
  }
});

(similar to what you are already doing in your init, but just removing the ones you want to exclude for a chance).

Would be interesting to know if this is only related to the pg instrumentation 🤔 I don't see anything in the trace you sent that appears wrong, at a glance 🤔

gajus added the Type: Bug label May 4, 2024

getsantry bot added the Waiting for: Product Owner label May 4, 2024

github-actions bot added the Package: node label May 4, 2024

getsantry bot removed the Waiting for: Product Owner label May 6, 2024

getsantry bot added the Waiting for: Product Owner label May 9, 2024

getsantry bot removed the Waiting for: Product Owner label May 13, 2024

getsantry bot added the Waiting for: Product Owner label May 15, 2024

getsantry bot removed the Waiting for: Product Owner label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.0.0-beta.6 massive performance degradation/connections dropped #11897

8.0.0-beta.6 massive performance degradation/connections dropped #11897

gajus commented May 4, 2024

mydea commented May 6, 2024

gajus commented May 9, 2024

mydea commented May 13, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024 •

edited

mydea commented May 16, 2024

8.0.0-beta.6 massive performance degradation/connections dropped #11897

8.0.0-beta.6 massive performance degradation/connections dropped #11897

Comments

gajus commented May 4, 2024

Is there an existing issue for this?

How do you use Sentry?

Which SDK are you using?

SDK Version

Framework Version

Link to Sentry event

SDK Setup

Steps to Reproduce

Expected Result

Actual Result

mydea commented May 6, 2024

gajus commented May 9, 2024

mydea commented May 13, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024

gajus commented May 15, 2024 • edited

mydea commented May 16, 2024

gajus commented May 15, 2024 •

edited