Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[network-plugin] Feat: Capture network events #1105

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

jlalmes
Copy link
Contributor

@jlalmes jlalmes commented Jan 31, 2023

Closes #552

This implementation records fetch and XMLHttpRequest by patching their object & methods. We record document navigation using PerformanceNavigationTiming and we use PerformanceResourceTiming for recording everything else (script, img, link etc.) via PerformanceObserver API.

Recording

type NetworkRequest = {
  url: string;
  method?: string;
  initiatorType: InitiatorType;
  status?: number;
  startTime: number;
  endTime: number;
  requestHeaders?: Headers;
  requestBody?: string | null;
  responseHeaders?: Headers;
  responseBody?: string | null;
};

rrweb.record({
  ...,
  plugins: [
    getRecordNetworkPlugin({
      initiatorTypes: ['fetch', 'xmlhttprequest'], // omit to record all types
      ignoreRequestFn: (request: NetworkRequest) => false
      recordHeaders: true, // defaults to false
      recordBody: true, // defaults to false
      recordInitialRequests: true, // defaults to false
    })
  ]
})
  • initiatorTypes is an array specifying which request types to record
  • ignoreRequestFn is a function that allow you to block recording an event for a request (you don't want to create a loop recording the request you send to store rrweb events on your server)
  • recordHeaders will record an request/response headers for fetch and xmlhttprequest
  • recordBody will record an request/response bodies for fetch and xmlhttprequest
  • recordInitialRequests will record an event for all requests prior to rrweb.record() being called

Replaying

new rrweb.Replayer(events, {
  ...,
  plugins: [
    getReplayNetworkPlugin({
      onNetworkData: (data: { requests: NetworkRequest[], isInitial?: boolean }) => { // required
        // do something
      }, 
    })
  ]
})
  • You must supply your own onNetworkData function.

To-do

  • Tests

@jlalmes jlalmes changed the title record xhr [network-plugin] Feat: Add network recording plugin Jan 31, 2023
@jlalmes jlalmes changed the title [network-plugin] Feat: Add network recording plugin [network-plugin] Feat: Capture network events Jan 31, 2023
@jlalmes jlalmes marked this pull request as ready for review February 1, 2023 13:48
@jlalmes jlalmes marked this pull request as draft February 1, 2023 15:29
@YunFeng0817
Copy link
Member

YunFeng0817 commented Feb 8, 2023

Thanks for applying all review suggestions. Although there are still a few small points that need to tweak, it's very close to the final version.

FYI here is a TODO list:

  • improve the content type filter
  • test cases
  • Chinese docs (I will add them later)

I'm so sorry that, for the release of 2.0.0-alpha.5, I also don't have a specific date to tell. These days other team members are all busy with their work and some important pull requests for bug fixes are delayed.

Copy link
Member

@YunFeng0817 YunFeng0817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more tests, I think this pull request is good to go

@Juice10
Copy link
Contributor

Juice10 commented Feb 9, 2023

This looks great! I agree with @Mark-Fenng, if we have some more tests this will be ready to merge. Also I'm happy to spend some time working on the 2.0.0-alpha.5 release soon

Co-authored-by: Ben White <ben@benjackwhite.co.uk>
@changeset-bot
Copy link

changeset-bot bot commented Feb 20, 2023

⚠️ No Changeset found

Latest commit: 8863576

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@benjackwhite
Copy link
Contributor

@jlalmes need some help getting this across the finish line? This would be such a good improvement to the custom solution we have for network recording and things here look pretty solid, just needing some good tests it seems.

@yaoyuanrong
Copy link

Is this plugin released?

@yaoyuanrong
Copy link

yaoyuanrong commented May 9, 2023

I really need this feature。I think this is a very powerful feature @jlalmes

Copy link
Contributor

@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw this is starting to get reviewed and wanted to throw in a thought regarding ignoreRequestFn to see if we can't make it more flexible, as well as more inline with the text and input style approach.

Comment on lines +466 to +468
const requests = data.requests.filter(
(request) => !networkOptions.ignoreRequestFn(request),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if this was both filtering and cleansing.

The most typical case for privacy controls is:
"I want to filter out requests to this place"
"I wan't to ensure certain headers aren't included"
"I want to filter out PII from the response bodies".

This would all doable with a slight modification of the function:

Suggested change
const requests = data.requests.filter(
(request) => !networkOptions.ignoreRequestFn(request),
);
const requests = data.requests.reduce(
(list, request) => {
const maskedRequest = networkOptions.maskRequestFn(request)
return maskedRequest ? [...list, maskedRequest] : list
}, []
);

Which then allows the function to be something like

maskRequestFn: (request: NetworkRequest): NetworkRequest | undefined  => {
  if (request.url.contains("rrweb-collector-api.com") {
    return
 }
 delete request.requestHeaders["Authorization"]
 request.responseBody = maskTextFn(request.responseBody)
 return request
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this could be added in a follow up PR as well. Happy to do that if that is the decision.

if (attempt > 10) {
throw new Error('Cannot find performance entry');
}
const urlPerformanceEntries = win.performance.getEntriesByName(
Copy link
Contributor

@pauldambra pauldambra Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly I think we'd need to make this optional...

If people already have their own code (or another dependency) tracking performance entries they might be calling https://developer.mozilla.org/en-US/docs/Web/API/Performance/clearResourceTimings already so we'd have a race.

Since this usage comes after the response if someone is clearing the buffer as soon as they're able to then I guess that this code would lose the race.

I think it's cheap enough to have the observer running no matter what that you could have recordRequestPerformance as a config option and return early here so the signature becomes Promise<PerformanceRequestTiming | null>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake - 🤯 I've just read back through this and seen that the thrown error (and any other error here) are captured and swallowed where the returned promise is handled.

My preference would be to return null so that the flow is more clear but at least the error here doesn't bubble out

@jasonfill
Copy link

I would love this feature! Just throwing in a 👍 for this.

Comment on lines +81 to +92
type NetworkRequest = {
url: string;
method?: string;
initiatorType: InitiatorType;
status?: number;
startTime: number;
endTime: number;
requestHeaders?: Headers;
requestBody?: Body;
responseHeaders?: Headers;
responseBody?: Body;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was experimenting with this again...

We already capture Network timing and the PerformanceEntry spec uses name where this NetworkRequest timing uses url

It is more natural for the typing here to be something like

export type CapturedNetworkRequest = Omit<PerformanceEntry, 'toJSON'> & {
    method?: string
    initiatorType?: InitiatorType
    status?: number
    startTime?: number
    endTime?: number
    requestHeaders?: Headers
    requestBody?: Body
    responseHeaders?: Headers
    responseBody?: Body
    // was this captured before fetch/xhr could have been wrapped
    isInitial?: boolean
}

That way you can extend this to capture PerformanceEntrys and/or headers and/or body and only use one type

@pathik-appachhi
Copy link

Hello,

I have recently started using rrrweb-io via the cdn . It would be very helpful to know when this plugin will be available on the CDN link.

@ShayMalchi
Copy link

@jlalmes @YunFeng0817 Seems like this PR was approved long ago, can you please share what is the current blocker for merging? Many people in this thread are interested in it, and seems like most suggestions could be handled in followup PRs. Thank you!

@pauldambra
Copy link
Contributor

I'll just re-up this comment #1105 (comment)

It's easier for me if we have name not URL. Since the performance observer uses name and emits more than just things with URLs.

@pauldambra
Copy link
Contributor

We have network payload capture in prod now based on this PR - thanks so much @jlalmes 🙌

You can see the code here https://github.com/PostHog/posthog-js/blob/main/src/loader-recorder-v2.ts

We couldn't use it quite as it is... and I've had to make a couple of bugfixes since releasing it - mostly around processing bodies on unexpected types of network requests

My plan is to let it settle in our product for a week or two. And then I'll open a PR based on this one that would let us inject the things we'd varied - which might also be things other people would want to vary...

It won't be massively different - and of course the version that works for us might not be preferable generally

@sharan-ravi
Copy link

sharan-ravi commented Feb 20, 2024

@jlalmes @pauldambra
Thank you so much for this!

I was running into an issue with some of my requests not being recorded with this plugin, after digging deeper, turns out in case of certain requests, if the server doesn't return a Timing-Allow-Origin response header, we wouldn't be able to track the performance metrics. Given this plugin throws an error when performance metrics are not available, these requests were not being recorded.

While the performance metrics are not available, Other data on these requests are still available, so in my case I had to do something like this to still record these requests.

getRequestPerformanceEntry(win, 'fetch', req.url, after, before)
                    .then((entry) => {
                        if (_isNull(entry)) {
                            const requests = prepareRequestWithoutPerformance(req, networkRequest)
                            cb({ requests })
                        } else {
                            const requests = prepareRequest(entry, req.method, res?.status, networkRequest)
                            cb({ requests })
                        } })

function prepareRequestWithoutPerformance(
    req: Request,
    networkRequest: Partial<CapturedNetworkRequest>,
): CapturedNetworkRequest[] {

    const reqJson = {
        url: req.url,
        method: req.method,
        requestHeaders: networkRequest.requestHeaders,
        requestBody: networkRequest.requestBody,
        responseHeaders: networkRequest.responseHeaders,
        responseBody: networkRequest.responseBody,
    }

    return [reqJson]
}

Is this expected?

@Juice10
Copy link
Contributor

Juice10 commented Apr 18, 2024

My plan is to let it settle in our product for a week or two. And then I'll open a PR based on this one that would let us inject the things we'd varied - which might also be things other people would want to vary...

It won't be massively different - and of course the version that works for us might not be preferable generally

@pauldambra would you still be interested to open a PR based on the version you have running in production?

FYI: #1033 just got finished which moves the all of the plugins to the packages/plugins/... directory. Might be worth considering creating the PR based on that as well

@akhsri
Copy link

akhsri commented May 2, 2024

Hi @jlalmes , When can we see this in rrweb package? it would be very powerful and helpful feature!

@pauldambra
Copy link
Contributor

@Juice10 definitely... our implementation has been pretty stable for a while now... it's just time that's stopping me 🙈

I think the "tricky" thing will be separating our config/mapping needs out of the plugin but that's pretty mechanical non-ground-breaking stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XHR Capture