Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance events for test-service-load client election #21086

Merged
merged 4 commits into from
May 16, 2024

Conversation

kian-thompson
Copy link
Contributor

We are seeing weird gaps in summarizer in test-service-load runs. This PR adds optional performance events (only enabled in test-service-load) to track adding, removing, and electing clients.

@github-actions github-actions bot added area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch and removed area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc labels May 15, 2024
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented May 15, 2024

@fluid-example/bundle-size-tests: +4 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 452.91 KB 453.91 KB +1023 Bytes
azureClient.js 550.15 KB 551.15 KB +1023 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 256.75 KB 257.75 KB +1023 Bytes
fluidFramework.js 359.58 KB 359.58 KB No change
loader.js 132.91 KB 132.91 KB No change
map.js 41.45 KB 41.45 KB No change
matrix.js 143.67 KB 143.67 KB No change
odspClient.js 518.69 KB 519.69 KB +1023 Bytes
odspDriver.js 97.29 KB 97.29 KB No change
odspPrefetchSnapshot.js 42.15 KB 42.15 KB No change
sharedString.js 160.19 KB 160.19 KB No change
sharedTree.js 359.58 KB 359.58 KB No change
Total Size 3.19 MB 3.2 MB +4 KB

Baseline commit: d42d16d

Generated by 🚫 dangerJS against 3f30208

@@ -426,12 +428,15 @@ export class OrderedClientElection
change = true;
}
if (change) {
this.sendPerformanceEvent("Election", client, sequenceNumber);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every time we get here, we should log an event (i.e. not sample / hide it behind feature gate).
That's not many events - we issue a ton events before we get here (overall).

Analyzing code, I think the only reason "ElectedClientNotSummarizing" is not firing is because this.clientElection.electionSequenceNumber keep advancing, and that means tryElectingClient() is called (FYI - the only place it path be called in case of that stress test failure looks like from resetElectedClient()).

Given that we could run into these issues in prod as well, I'd rather prep us for more actionable future even if we do not enable new feature gate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing the code, this function should never be called with this._electedClient === client, so maybe we should replace it with assert? Maybe incrementElectedClient() / resetElectedClient() can result in calling with existing client (not sure), but they are not called anywhere.

The more I think about it, the more I want to see the following telemetry in prod (properties on event that fires every time this function is called):

  • reason (passed from caller) for summarizer election (i.e. summarizer client removed to quorum, summarizer parent removed from quorum, new summarizer client added).
  • old value of this._electedClient?.clientId
  • new value (client?.clientId).
  • if new client is a summarizer client or not.
  • other properties that you are adding.

@github-actions github-actions bot added area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc labels May 15, 2024
@kian-thompson kian-thompson merged commit d544407 into microsoft:main May 16, 2024
30 checks passed
@kian-thompson kian-thompson deleted the log-quorum branch May 16, 2024 18:15
kekachmar pushed a commit to kekachmar/FluidFramework that referenced this pull request May 21, 2024
…ft#21086)

We are seeing weird gaps in summarizer in test-service-load runs. This
PR adds optional performance events (only enabled in test-service-load)
to track adding, removing, and electing clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants