Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve paramaterization of transaction names #5342

Closed
9 of 10 tasks
smeubank opened this issue Jun 30, 2022 · 8 comments
Closed
9 of 10 tasks

Improve paramaterization of transaction names #5342

smeubank opened this issue Jun 30, 2022 · 8 comments

Comments

@smeubank
Copy link
Member

smeubank commented Jun 30, 2022

Describe the idea

We need to revisit how the JS SDK captures transactions (URL routes) and sends them to sentry.

Reasons for doing this and things we need to consider:

  • parameterization: We should parameterize URLs whenever possible

Because

  • grouping: transaction name influences transaction grouping; raw URL names lead to ungroupted transactions
  • indexing and high cardinality: Raw URL transaction names lead to a high cardinality of transactions (because they're not grouped)
  • PII: Raw URL transaction names can contain sensitive data/PII (e.g. IDs, auth tokens, etc.)
  • Dynamic Sampling: Propagating raw URLs in DSC vs. showing parameterized routes in the Sentry UI (and the DS settings) creates a lot of user confusion

Best effort has been decent so far, but the fallback to unparameterized or whole URL maybe sub-optimal.

Examples:

  • Low-cardinality transaction name:
    • {"transaction": "/users/{username}", "transaction_source": "route"}
  • Presumably a high-cardinality transaction name:
    • {"transaction": "/users/123235", "transaction_source": "uri"}
  • User-defined transaction name, cardinality unknown:
    • {"transaction": "my_transaction_name", "transaction_source": "custom"}

Requirement:

what should be sent and where?

baggage header

  • error envelope payload

  • in the envelope header

Possible implementation

There's a couple of things we can do or at least check:

Existing Routing Instrumentations with parameterization

As listed in #5345, we have a lot of popular routers covered with routing instrumentations. However, we might be able to improve paramenterizations in some of them. Hence, for each instrumentation

  • check if there is a way to parameterize earlier
  • try to match routes better (if it turns out that there are cases where our current matching fails)
  • (send source information; tracked in Add support for transaction source #5345)

Existing Routing Instrumentations without parameterization

TODO: Check to which routers this applies

There are some routing instrumentations that don't parameterize currently.

  • add parameterization whenever possible

TBD: Approximative Parameterization

This has been discussed quite a bit in the past but given that we have to make our best effort for parameterization, let's revisit this topic. The idea is seemingly simple: We could try to add a mechanism that takes a raw URL and tries to guess parts of that URL that might be parameters (e.g. IDs, tokens, etc). The mechanism would then replace these parts with a generic param placeholder.

Example:

/users/1235/credentials ==> users/:id/credentials

There are a lot of possible issues with this because obviously, there are going to be loads of edge cases, where this approximation might be off or miss parameters completely.

Why is this challenging?

  • In the product, we have to explain why DSC might have another (or no) transaction name than what's visible in the UI
  • We have to explain why unparameterized routes are sent and even full URLs.
  • Some frameworks and routers provide unique challenges
  • What about custom routing instrumentations we don't control?

Places to improve parameterization

@smeubank
Copy link
Member Author

smeubank commented Jul 6, 2022

@Lms24 let me know when you have time, this needs refinement. We can loop in others of course

  • identify ways we could quantify this problem
  • identify problem areas we want and importantly could address

@Lms24
Copy link
Member

Lms24 commented Jul 6, 2022

Always happy to refine/discuss this. Since this became a very important issue, we should sync with everyone here who has context on how transaction names/parameterization/routing instrumentation currently works

@Lms24
Copy link
Member

Lms24 commented Jul 7, 2022

@smeubank fleshed out the issue description. Feel free to add/change stuff

@Lms24 Lms24 changed the title (draft)Re-visit transactions name capturing (draft)Re-visit transaction name capturing Jul 7, 2022
@vladanpaunovic vladanpaunovic changed the title (draft)Re-visit transaction name capturing Improve parametariaztion of transaction names Jul 7, 2022
@AbhiPrasad AbhiPrasad changed the title Improve parametariaztion of transaction names Improve paramaterization of transaction names Jul 8, 2022
@jamesbvaughan
Copy link

Apologies if there is a better place for me to ask about this, but I have been unable to get parameterization to work with react-router v6.

Here are some excerpts of what I believe to be the relevant code:

Sentry.init({
  ...
  integrations: [
    new BrowserTracing({
      routingInstrumentation: Sentry.reactRouterV6Instrumentation(
        useEffect,
        useLocation,
        useNavigationType,
        createRoutesFromChildren,
        matchRoutes,
      ),
    }),
  ],
});

const SentryRoutes = Sentry.withSentryReactRouterV6Routing(Routes);

const MyRoutes = () => {
  return (
    <SentryRoutes>
      <Route index element={<Navigate to="/projects" />} />

      <Route path="/account" element={<AccountPage />} />

      <Route path="/projects">
        <Route index element={<ProjectBrowser />} />

        <Route path=":projectId" element={<ProjectPage />}>
          <Route index element={<ProjectPageRoot />} />

          <Route element={<EditorPage />}>
            <Route path="views/:viewId" element={<ViewCanvas />} />
            <Route path="spaces/:spaceId" element={<SpaceCanvas />} />
          </Route>
        </Route>
      </Route>

      <Route path="*" element={<NoMatchPage />} />
    </SentryRoutes>
  );
};

export default MyRoutes

As far as I can tell, I'm not doing anything unusual, except possibly the use of react router's Outlets and Layout Routes.

With this configuration I would expect to see a single transaction for /projects/*/views/*, but instead I'm seeing separate transactions for each individual page.

Is there something obvious that I'm doing wrong here, or does the sentry integration not support parameterization with this usage?

Lms24 added a commit that referenced this issue Jul 26, 2022
Improve the URL parameterization for transaction names in our Express integration.

Previously, we would only obtain a parameterized version of the incoming request's URL **after** the registered handler(s) finished their job, right before we called `transaction.finish()`. This is definitely too late for DSC propagation, if the handlers made any outgoing request. In that case, the DSC (propagated via the `baggage` header) would not contain a transaction name.

As of this PR, we patch the `Express.Router` prototype, more precisely the `process_params` function. This function contains a reference to the incoming `req` object as well as to the `layer` that matched the route. We hook into the function to extract the matched and parameterized partial route of the layer and we attach it to the `req` object. This happens for every matched layer, until the entire route is resolved, in which stichtched together the entire parameterized route. 

For the time being, we deliberately ignore route registrations with wildcards (e.g. `app.get('/*', ...)`) when stitching together the parameterized route. After we added a new part to the reconstructed route, we check if it has the same number of segments as the original (raw) URL. In case it has, we assume that the parameterized route is complete and we update the active transaction with the parameterized name. Additionally, we set the transaction source to `route` at this point. In case we never get to the point where we have an equal amount of URL segments, the transaction name is never updated, meaning its source stays `url` (and therefore, it's not added to the DSC). In that case, we continue to update the transaction name like before right before the end of the transaction.

After reading the Express source code, we confirmed that the process for resolving parts of a route and handling it is performed sequentially **for each matched route segment**. This means that each matched layer's handle function is invoked before the next part of the URL is matched. We therefore still have timing issues around DSC population if any of those intermediate handler functions make outgoing requests. A simple example:

The incoming request is `/api/users/123`
* We intercept the request and start the transaction with the name `/api/users/123` and source `url`
* The first layer that matches is matches the route `/*`.
  * We start reconstructing the parameterized route with `/`
  * The handle function of this layer is invoked (e.g. to check authentication)
    *  the handler makes an XHR request 
      * at this point, we populate and freeze the DSC (without transaction name)
* The second handler matches the route `/api/users/123`
  * We obtain the parameterized route and our reconstructed route is now `/api/users/:id`
  * Now we have 3 segments in the original and in the reconstructed route
  * We update the transaction name with `/api/users/:id` and set its source to `route`
  * The handle function of this layer is invoked
    * every request that might happen here will still propagate the DSC from layer 1 because it can't be modified
* We finish the transaction

This example shows that we still have timing issues w.r.t DSC propagation. That is, assuming that the Node SDK is in this case the head of the trace and it didn't intercept incoming DSC from the incoming request.
However, with this PR we now at least have the chance to get the correct transaction name early enough.

ref: #5342
@Lms24
Copy link
Member

Lms24 commented Jul 27, 2022

Hi @jamesbvaughan, would you mind opening a dedicated issue for this? Makes it easier to track this for us.
cc @AbhiPrasad maybe you have a quick idea?

@jamesbvaughan
Copy link

I've created #5513 to track my issue separately. Thanks!

@github-actions
Copy link
Contributor

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@vladanpaunovic
Copy link
Contributor

Tracking the nextjs and remaining issues in #5505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants