Skip to content

Commit

Permalink
ref(node): Improve Express URL Parameterization (#5450)
Browse files Browse the repository at this point in the history
Improve the URL parameterization for transaction names in our Express integration.

Previously, we would only obtain a parameterized version of the incoming request's URL **after** the registered handler(s) finished their job, right before we called `transaction.finish()`. This is definitely too late for DSC propagation, if the handlers made any outgoing request. In that case, the DSC (propagated via the `baggage` header) would not contain a transaction name.

As of this PR, we patch the `Express.Router` prototype, more precisely the `process_params` function. This function contains a reference to the incoming `req` object as well as to the `layer` that matched the route. We hook into the function to extract the matched and parameterized partial route of the layer and we attach it to the `req` object. This happens for every matched layer, until the entire route is resolved, in which stichtched together the entire parameterized route. 

For the time being, we deliberately ignore route registrations with wildcards (e.g. `app.get('/*', ...)`) when stitching together the parameterized route. After we added a new part to the reconstructed route, we check if it has the same number of segments as the original (raw) URL. In case it has, we assume that the parameterized route is complete and we update the active transaction with the parameterized name. Additionally, we set the transaction source to `route` at this point. In case we never get to the point where we have an equal amount of URL segments, the transaction name is never updated, meaning its source stays `url` (and therefore, it's not added to the DSC). In that case, we continue to update the transaction name like before right before the end of the transaction.

After reading the Express source code, we confirmed that the process for resolving parts of a route and handling it is performed sequentially **for each matched route segment**. This means that each matched layer's handle function is invoked before the next part of the URL is matched. We therefore still have timing issues around DSC population if any of those intermediate handler functions make outgoing requests. A simple example:

The incoming request is `/api/users/123`
* We intercept the request and start the transaction with the name `/api/users/123` and source `url`
* The first layer that matches is matches the route `/*`.
  * We start reconstructing the parameterized route with `/`
  * The handle function of this layer is invoked (e.g. to check authentication)
    *  the handler makes an XHR request 
      * at this point, we populate and freeze the DSC (without transaction name)
* The second handler matches the route `/api/users/123`
  * We obtain the parameterized route and our reconstructed route is now `/api/users/:id`
  * Now we have 3 segments in the original and in the reconstructed route
  * We update the transaction name with `/api/users/:id` and set its source to `route`
  * The handle function of this layer is invoked
    * every request that might happen here will still propagate the DSC from layer 1 because it can't be modified
* We finish the transaction

This example shows that we still have timing issues w.r.t DSC propagation. That is, assuming that the Node SDK is in this case the head of the trace and it didn't intercept incoming DSC from the incoming request.
However, with this PR we now at least have the chance to get the correct transaction name early enough.

ref: #5342
  • Loading branch information
Lms24 committed Jul 26, 2022
1 parent 3f94dc1 commit 5e9327f
Show file tree
Hide file tree
Showing 9 changed files with 292 additions and 40 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ test('Should populate and propagate sentry baggage if sentry-trace header does n
host: 'somewhere.not.sentry',
// TraceId changes, hence we only expect that the string contains the traceid key
baggage: expect.stringContaining(
'sentry-environment=prod,sentry-release=1.0,sentry-public_key=public,sentry-trace_id=',
'sentry-environment=prod,sentry-release=1.0,sentry-transaction=GET%20%2Ftest%2Fexpress,sentry-public_key=public,sentry-trace_id=',
),
},
});
Expand All @@ -99,7 +99,7 @@ test('Should populate Sentry and ignore 3rd party content if sentry-trace header
host: 'somewhere.not.sentry',
// TraceId changes, hence we only expect that the string contains the traceid key
baggage: expect.stringContaining(
'sentry-environment=prod,sentry-release=1.0,sentry-public_key=public,sentry-trace_id=',
'sentry-environment=prod,sentry-release=1.0,sentry-transaction=GET%20%2Ftest%2Fexpress,sentry-public_key=public,sentry-trace_id=',
),
},
});
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import * as Sentry from '@sentry/node';
import * as Tracing from '@sentry/tracing';
import cors from 'cors';
import express from 'express';
import http from 'http';

const app = express();

export type TestAPIResponse = { test_data: { host: string; 'sentry-trace': string; baggage: string } };

Sentry.init({
dsn: 'https://public@dsn.ingest.sentry.io/1337',
release: '1.0',
environment: 'prod',
integrations: [new Sentry.Integrations.Http({ tracing: true }), new Tracing.Integrations.Express({ app })],
tracesSampleRate: 1.0,
});

Sentry.setUser({ id: 'user123', segment: 'SegmentA' });

app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

app.use(cors());

app.get('/test/express', (_req, res) => {
const transaction = Sentry.getCurrentHub().getScope()?.getTransaction();
if (transaction) {
transaction.traceId = '86f39e84263a4de99c326acab3bfe3bd';
transaction.metadata.source = undefined;
}
const headers = http.get('http://somewhere.not.sentry/').getHeaders();

// Responding with the headers outgoing request headers back to the assertions.
res.send({ test_data: headers });
});

app.use(Sentry.Handlers.errorHandler());

export default app;
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import * as path from 'path';

import { getAPIResponse, runServer } from '../../../../utils/index';
import { TestAPIResponse } from '../server';

test('Does not include transaction name if transaction source is not set', async () => {
const url = await runServer(__dirname, `${path.resolve(__dirname, '.')}/server.ts`);

const response = (await getAPIResponse(new URL(`${url}/express`))) as TestAPIResponse;
const baggageString = response.test_data.baggage;

expect(response).toBeDefined();
expect(response).toMatchObject({
test_data: {
host: 'somewhere.not.sentry',
},
});
expect(baggageString).not.toContain('sentry-transaction=');
});
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,8 @@ test('should attach a `baggage` header to an outgoing request.', async () => {
test_data: {
host: 'somewhere.not.sentry',
baggage:
'sentry-environment=prod,sentry-release=1.0,sentry-user_segment=SegmentA' +
'sentry-environment=prod,sentry-release=1.0,sentry-transaction=GET%20%2Ftest%2Fexpress,sentry-user_segment=SegmentA' +
',sentry-public_key=public,sentry-trace_id=86f39e84263a4de99c326acab3bfe3bd,sentry-sample_rate=1',
},
});
});

test('Does not include transaction name if transaction source is not set', async () => {
const url = await runServer(__dirname, `${path.resolve(__dirname, '.')}/server.ts`);

const response = (await getAPIResponse(new URL(`${url}/express`))) as TestAPIResponse;
const baggageString = response.test_data.baggage;

expect(response).toBeDefined();
expect(response).toMatchObject({
test_data: {
host: 'somewhere.not.sentry',
},
});
expect(baggageString).not.toContain('sentry-user_id=');
expect(baggageString).not.toContain('sentry-transaction=');
});
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ test('should ignore sentry-values in `baggage` header of a third party vendor an
test_data: {
host: 'somewhere.not.sentry',
baggage: expect.stringContaining(
'other=vendor,foo=bar,third=party,last=item,sentry-environment=prod,sentry-release=1.0,sentry-public_key=public',
'other=vendor,foo=bar,third=party,last=item,sentry-environment=prod,sentry-release=1.0,sentry-transaction=GET%20%2Ftest%2Fexpress,sentry-public_key=public',
),
},
});
Expand Down
5 changes: 3 additions & 2 deletions packages/node/src/handlers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,13 @@ export function tracingHandler(): (

const baggage = parseBaggageSetMutability(rawBaggageString, traceparentData);

const [name, source] = extractPathForTransaction(req, { path: true, method: true });
const transaction = startTransaction(
{
name: extractPathForTransaction(req, { path: true, method: true }),
name,
op: 'http.server',
...traceparentData,
metadata: { baggage },
metadata: { baggage, source },
},
// extra context passed to the tracesSampler
{ request: extractRequestData(req) },
Expand Down
99 changes: 98 additions & 1 deletion packages/node/test/requestdata.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@

// TODO (v8 / #5257): Remove everything above

import { Event, User } from '@sentry/types';
import { Event, TransactionSource, User } from '@sentry/types';
import {
addRequestDataToEvent,
AddRequestDataToEventOptions,
CrossPlatformRequest,
extractPathForTransaction,
extractRequestData as newExtractRequestData,
} from '@sentry/utils';
import * as cookie from 'cookie';
Expand Down Expand Up @@ -485,3 +486,99 @@ describe.each([oldExtractRequestData, newExtractRequestData])(
});
},
);

describe('extractPathForTransaction', () => {
it.each([
[
'extracts a parameterized route and method if available',
{
method: 'get',
baseUrl: '/api/users',
route: { path: '/:id/details' },
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest,
{ path: true, method: true },
'GET /api/users/:id/details',
'route' as TransactionSource,
],
[
'ignores the method if specified',
{
method: 'get',
baseUrl: '/api/users',
route: { path: '/:id/details' },
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest,
{ path: true, method: false },
'/api/users/:id/details',
'route' as TransactionSource,
],
[
'ignores the path if specified',
{
method: 'get',
baseUrl: '/api/users',
route: { path: '/:id/details' },
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest,
{ path: false, method: true },
'GET',
'route' as TransactionSource,
],
[
'returns an empty string if everything should be ignored',
{
method: 'get',
baseUrl: '/api/users',
route: { path: '/:id/details' },
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest,
{ path: false, method: false },
'',
'route' as TransactionSource,
],
[
'falls back to the raw URL if no parameterized route is available',
{
method: 'get',
baseUrl: '/api/users',
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest,
{ path: true, method: true },
'GET /api/users/123/details',
'url' as TransactionSource,
],
])(
'%s',
(
_: string,
req: CrossPlatformRequest,
options: { path?: boolean; method?: boolean },
expectedRoute: string,
expectedSource: TransactionSource,
) => {
const [route, source] = extractPathForTransaction(req, options);

expect(route).toEqual(expectedRoute);
expect(source).toEqual(expectedSource);
},
);

it('overrides the requests information with a custom route if specified', () => {
const req = {
method: 'get',
baseUrl: '/api/users',
route: { path: '/:id/details' },
originalUrl: '/api/users/123/details',
} as CrossPlatformRequest;

const [route, source] = extractPathForTransaction(req, {
path: true,
method: true,
customRoute: '/other/path/:id/details',
});

expect(route).toEqual('GET /other/path/:id/details');
expect(source).toEqual('route');
});
});
102 changes: 101 additions & 1 deletion packages/tracing/src/integrations/node/express.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
/* eslint-disable max-lines */
import { Integration, Transaction } from '@sentry/types';
import { logger } from '@sentry/utils';
import { CrossPlatformRequest, extractPathForTransaction, logger } from '@sentry/utils';

type Method =
| 'all'
Expand Down Expand Up @@ -32,6 +33,32 @@ type Router = {
[method in Method]: (...args: any) => any; // eslint-disable-line @typescript-eslint/no-explicit-any
};

/* Extend the CrossPlatformRequest type with a patched parameter to build a reconstructed route */
type PatchedRequest = CrossPlatformRequest & { _reconstructedRoute?: string };

/* Type used for pathing the express router prototype */
type ExpressRouter = Router & {
_router?: ExpressRouter;
stack?: Layer[];
lazyrouter?: () => void;
settings?: unknown;
process_params: (
layer: Layer,
called: unknown,
req: PatchedRequest,
res: ExpressResponse,
done: () => void,
) => unknown;
};

/* Type used for pathing the express router prototype */
type Layer = {
match: (path: string) => boolean;
handle_request: (req: PatchedRequest, res: ExpressResponse, next: () => void) => void;
route?: { path: string };
path?: string;
};

interface ExpressResponse {
once(name: string, callback: () => void): void;
}
Expand Down Expand Up @@ -83,6 +110,7 @@ export class Express implements Integration {
return;
}
instrumentMiddlewares(this._router, this._methods);
instrumentRouter(this._router as ExpressRouter);
}
}

Expand Down Expand Up @@ -211,3 +239,75 @@ function patchMiddleware(router: Router, method: Method): Router {
function instrumentMiddlewares(router: Router, methods: Method[] = []): void {
methods.forEach((method: Method) => patchMiddleware(router, method));
}

/**
* Patches the prototype of Express.Router to accumulate the resolved route
* if a layer instance's `match` function was called and it returned a successful match.
*
* @see https://github.com/expressjs/express/blob/master/lib/router/index.js
*
* @param appOrRouter the router instance which can either be an app (i.e. top-level) or a (nested) router.
*/
function instrumentRouter(appOrRouter: ExpressRouter): void {
// This is how we can distinguish between app and routers
const isApp = 'settings' in appOrRouter;

// In case the app's top-level router hasn't been initialized yet, we have to do it now
if (isApp && appOrRouter._router === undefined && appOrRouter.lazyrouter) {
appOrRouter.lazyrouter();
}

const router = isApp ? appOrRouter._router : appOrRouter;
const routerProto = Object.getPrototypeOf(router) as ExpressRouter;

const originalProcessParams = routerProto.process_params;
routerProto.process_params = function process_params(
layer: Layer,
called: unknown,
req: PatchedRequest,
res: ExpressResponse & SentryTracingResponse,
done: () => unknown,
) {
// Base case: We're in the first part of the URL (thus we start with the root '/')
if (!req._reconstructedRoute) {
req._reconstructedRoute = '';
}

// If the layer's partial route has params, the route is stored in layer.route. Otherwise, the hardcoded path
// (i.e. a partial route without params) is stored in layer.path
const partialRoute = layer.route?.path || layer.path || '';

// Normalize the partial route so that it doesn't contain leading or trailing slashes
// and exclude empty or '*' wildcard routes.
// The exclusion of '*' routes is our best effort to not "pollute" the transaction name
// with interim handlers (e.g. ones that check authentication or do other middleware stuff).
// We want to end up with the parameterized URL of the incoming request without any extraneous path segments.
const finalPartialRoute = partialRoute
.split('/')
.filter(segment => segment.length > 0 && !segment.includes('*'))
.join('/');

// If we found a valid partial URL, we append it to the reconstructed route
if (finalPartialRoute.length > 0) {
req._reconstructedRoute += `/${finalPartialRoute}`;
}

// Now we check if we are in the "last" part of the route. We determine this by comparing the
// number of URL segments from the original URL to that of our reconstructed parameterized URL.
// If we've reached our final destination, we update the transaction name.
const urlLength = req.originalUrl?.split('/').filter(s => s.length > 0).length;
const routeLength = req._reconstructedRoute.split('/').filter(s => s.length > 0).length;
if (urlLength === routeLength) {
const transaction = res.__sentry_transaction;
if (transaction && transaction.metadata.source !== 'custom') {
// If the request URL is '/' or empty, the reconstructed route will be empty.
// Therefore, we fall back to setting the final route to '/' in this case.
const finalRoute = req._reconstructedRoute || '/';

transaction.setName(...extractPathForTransaction(req, { path: true, method: true, customRoute: finalRoute }));
}
}

return originalProcessParams.call(this, layer, called, req, res, done);
};
}

0 comments on commit 5e9327f

Please sign in to comment.