-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance events for test-service-load client election #21086
Conversation
⯅ @fluid-example/bundle-size-tests: +4 KB
Baseline commit: d42d16d |
packages/runtime/container-runtime/src/summary/orderedClientElection.ts
Outdated
Show resolved
Hide resolved
@@ -426,12 +428,15 @@ export class OrderedClientElection | |||
change = true; | |||
} | |||
if (change) { | |||
this.sendPerformanceEvent("Election", client, sequenceNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think every time we get here, we should log an event (i.e. not sample / hide it behind feature gate).
That's not many events - we issue a ton events before we get here (overall).
Analyzing code, I think the only reason "ElectedClientNotSummarizing" is not firing is because this.clientElection.electionSequenceNumber keep advancing, and that means tryElectingClient() is called (FYI - the only place it path be called in case of that stress test failure looks like from resetElectedClient()).
Given that we could run into these issues in prod as well, I'd rather prep us for more actionable future even if we do not enable new feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing the code, this function should never be called with this._electedClient === client, so maybe we should replace it with assert? Maybe incrementElectedClient() / resetElectedClient() can result in calling with existing client (not sure), but they are not called anywhere.
The more I think about it, the more I want to see the following telemetry in prod (properties on event that fires every time this function is called):
reason
(passed from caller) for summarizer election (i.e. summarizer client removed to quorum, summarizer parent removed from quorum, new summarizer client added).- old value of this._electedClient?.clientId
- new value (client?.clientId).
- if new client is a summarizer client or not.
- other properties that you are adding.
…ft#21086) We are seeing weird gaps in summarizer in test-service-load runs. This PR adds optional performance events (only enabled in test-service-load) to track adding, removing, and electing clients.
We are seeing weird gaps in summarizer in test-service-load runs. This PR adds optional performance events (only enabled in test-service-load) to track adding, removing, and electing clients.