Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(benchpress): correctly report memory usage numbers #50760

Conversation

pkozlowski-opensource
Copy link
Member

@pkozlowski-opensource pkozlowski-opensource commented Jun 19, 2023

This PR fixes GC memory numbers reported by benchpress, where previously reported amount was always 0.

The problem

To understand the fix we need to look into all the layers of the benchpress data processing from numbers reported by a browser, all the way up to the UI. Generally speaking there are 3 "interesting" layers involved:

  • a browser
  • a webdriver extension
  • perflog metrics processing inside benchpress

Let's look at the GC memory numbers and data structures at each layer.

Firstly, the browser is going to report an event similar to:

{
    "args": {
      "type": "finalize incremental marking via stack guard",
      "usedHeapSizeAfter": 19686548,
      "usedHeapSizeBefore": 23869760
    },
    "cat": "devtools.timeline,v8",
    "dur": 2250,
    "name": "MajorGC",
    "ph": "X",
    "pid": 89342,
    "tdur": 2090,
    "tid": 259,
    "ts": 367056011625,
    "tts": 239430
  },

Without going into the details, we can see that a GC event reports duration and before / after heap sizes (usedHeapSizeBefore, usedHeapSizeAfter). It seems like at this point we've got all the necessary information to report in the UI.

The data reported by the browser are then normalized to the PerfLogEvent data structure and the normalization process would produce:

 {
    pid: 89342,
    ph: 'X',
    cat: 'timeline',
    ts: 367056011.625,
    dur: 2.25,
    name: 'gc',
    args: { majorGc: true, usedHeapSize: 19686548 }
  },

This is the source of the bug as we are loosing information (notice that we only capture usedHeapSizeAfter !).

The final transformation step takes place in the metrics processing where the X event is split into a pair of begin / end events with the timestamp of the end event update to account for the duration:

{
    pid: 89342,
    ph: 'B',
    cat: 'timeline',
    ts: 367056011.625,
    dur: 2.25,
    name: 'gc',
    args: { majorGc: true, usedHeapSize: 19686548 }
  },
{
    pid: 89342,
    ph: 'E',
    cat: 'timeline',
    ts: 367056013.875,
    dur: 2.25,
    name: 'gc',
    args: { majorGc: true, usedHeapSize: 19686548 }
  },

Notice that we are still loosing information here - one events was just split into a pair of events but each item in this pair only contains usedHeapSize.

Given this data the GC reporting logic can only calculate 0 as the logic looks like follows:

const amount =
                (startEvent['args']!['usedHeapSize']! - event['args']!['usedHeapSize']!) / 1000;

The fix

The fix sketched in this PR adds a new gcAmount filed to the PerfLogEvent data structure and populates this data structure in the ChromeDriverExtension.

This approach works only for Chrome, obviously. Open questions:

  • what about other browsers?
  • what about the Chrome version update - is the latest version of Chrome producing the same events?

@ngbot ngbot bot added this to the Backlog milestone Jun 19, 2023
@pkozlowski-opensource pkozlowski-opensource marked this pull request as ready for review June 19, 2023 14:25
@pullapprove pullapprove bot requested a review from josephperrott June 19, 2023 14:25
@pkozlowski-opensource pkozlowski-opensource added state: blocked action: review The PR is still awaiting reviews from at least one requested reviewer labels Jun 20, 2023
@pkozlowski-opensource
Copy link
Member Author

This PR should not be merged before #50772 lands - so we can verify that the latest Chromium version is producing the same traces.

'majorGc': majorGc,
'usedHeapSize': heapSizeAfter !== undefined ? heapSizeAfter : heapSizeBefore,
'gcAmount': heapSizeBefore !== undefined && heapSizeAfter !== undefined ?
heapSizeBefore - heapSizeAfter :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, likely we should strictly error if we end up seeing an event that doesn't make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! So a good question here is where we can confidently error. I guess when both usedHeapSizeBefore and usedHeapSizeAfter are missing?

Pushed a fixup commit.

This PR fixes GC memory numbers reported by benchpress,
where previously reported amount was always 0.

This is due to the fact that Chrome browser reports GC
events as a single X event now, instead of a B / E pair
of events.
const normalizedArgs = {
'majorGc': majorGc,
'usedHeapSize': heapSizeAfter !== undefined ? heapSizeAfter : heapSizeBefore,
'gcAmount': heapSizeBefore !== undefined && heapSizeAfter !== undefined ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These undefined checks can be removed now right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - in case of the B / E events we can still get only one of those values and we can calculate amount given both only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was misreading the check above to ensure that always both values are available. So, am I understanding correctly that we are still getting B/E events for GC? I thought it was only X

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devversion X is the only event I've seen while running benchmarks. But reading and re-reading the Trace Event Format doc I'm getting the impression that X is just a "shorthand" for the B + E combination.

From https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit#heading=h.lpfof2aylapb :

Each complete event logically combines a pair of duration (B and E) events. The complete events are designated by the X phase type. In a trace that most of the events are duration events, using complete events to replace the duration events can reduce the size of the trace to about half.

So it looks to me like X meant to be an optimization and I can't exclude that some browsers / versions / scenarios produce B + E (as it would be valid, if I read the doc correctly).

At least this is my current understanding / thinking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'm just a little worried that at some point we'd end up seeing B/E events but the logic we have right now is not guaranteed to be correct as we are only experimenting with B/E- so I was hoping a little we could just make this an error and look into if we know B/E are reported for GC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do currently have tests for the B / E combo as, from I'm vaguely piece together, it was the initial browser behaviour. I'm not sure if any of existing browser still produce B / E combo. We could:

  • prepare an exploratory CL to "see what breaks"
  • check with the Chrome team (good idea anyway).

I was just unsure if we should explicitly break for something that sounds like valid (albeit probably historical) behaviour.

The bottom line here is that we are running with incomplete information. I'm going to reach out to the Chrome team and try to clarify.


const normalizedArgs = {
'majorGc': majorGc,
'usedHeapSize': heapSizeAfter !== undefined ? heapSizeAfter : heapSizeBefore,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit if you're making other changes:

Suggested change
'usedHeapSize': heapSizeAfter !== undefined ? heapSizeAfter : heapSizeBefore,
'usedHeapSize': heapSizeAfter ?? heapSizeBefore,

const normalizedArgs = {
'majorGc': majorGc,
'usedHeapSize': heapSizeAfter !== undefined ? heapSizeAfter : heapSizeBefore,
'gcAmount': heapSizeBefore !== undefined && heapSizeAfter !== undefined ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was misreading the check above to ensure that always both values are available. So, am I understanding correctly that we are still getting B/E events for GC? I thought it was only X

Copy link
Member

@josephperrott josephperrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pkozlowski-opensource pkozlowski-opensource added target: minor This PR is targeted for the next minor release action: merge The PR is ready for merge by the caretaker merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Jun 26, 2023
@pkozlowski-opensource
Copy link
Member Author

caretaker note: this is a change to the benchpress internals - something we didn't do for a while. I believe that this fix should not negatively impact any existing usages but we've got limited visibility on benchpress usage. I can run a TGP to have better overview.

All in all I would sync this change separately so it is easy to rollback if needed - one scenario I'm thinking of is some automated monitoring that might pick up change in the reported GC numbers that would shoot up from zero to the real usage.

@AndrewKushnir
Copy link
Contributor

This PR was merged into the repository by commit dd850b2.

sunilbaba pushed a commit to sunilbaba/angular that referenced this pull request Jul 26, 2023
This PR fixes GC memory numbers reported by benchpress,
where previously reported amount was always 0.

This is due to the fact that Chrome browser reports GC
events as a single X event now, instead of a B / E pair
of events.

PR Close angular#50760
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Jul 27, 2023
thomasturrell pushed a commit to thomasturrell/angular that referenced this pull request Aug 29, 2023
This PR fixes GC memory numbers reported by benchpress,
where previously reported amount was always 0.

This is due to the fact that Chrome browser reports GC
events as a single X event now, instead of a B / E pair
of events.

PR Close angular#50760
ChellappanRajan pushed a commit to ChellappanRajan/angular that referenced this pull request Jan 23, 2024
This PR fixes GC memory numbers reported by benchpress,
where previously reported amount was always 0.

This is due to the fact that Chrome browser reports GC
events as a single X event now, instead of a B / E pair
of events.

PR Close angular#50760
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker area: benchpress merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note target: minor This PR is targeted for the next minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants