-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update diagnostics service to version 1 #7059
Conversation
common/diagnostics/Metrics.java
Outdated
this.usage = new CurrentCounts(); | ||
this.userErrors = new UserErrorStatistics(); | ||
public void takeSnapshot() { | ||
this.base.updateSinceTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added sinceTimestamp
in the json
data in the end as it was strange to see diffs in the diagnostics on the monitoring page. It's not used anywhere in the external code, but could potentially be in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the monitoring page was supposed to use the same data as Prometheus, with counters steadily ticking up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but I just didn't know about it. Let's confirm with anyone who might be interested in this page. It would make sense, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it as you proposed. WIll update the examples and description tomorrow's morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
server/test/parameters/config/config-correct-optional-values.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was going to be renamed to smth like DiagnosticsStore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to rename anything before 3.0 as it could become messy with multiple changes. We are used to the current naming and it will be easer to reimplement and rename it in 3.0 this way, but I'm open to renaming it if it won't disturb other engineers even more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anyone would care, hardly anyone would refer to this class outside of significant changes in the service layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'd like to leave it as Metrics
for now as it's at least another anchor which could be shown when you search for metrics
and diagnostics store
is a kind of another entity we'd introduce with this renaming. But I wouldn't resist if you insisted that Metrics
is a bad option to leave in the codebase.
common/diagnostics/Metrics.java
Outdated
this.usage = new CurrentCounts(); | ||
this.userErrors = new UserErrorStatistics(); | ||
public void takeSnapshot() { | ||
this.base.updateSinceTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the monitoring page was supposed to use the same data as Prometheus, with counters steadily ticking up?
server/TypeDBService.java
Outdated
UUID sessionID = byteStringAsUUID(request.getSessionId()); | ||
SessionService sessionSvc = sessionServices.get(sessionID); | ||
if (sessionSvc == null) throw TypeDBException.of(SESSION_NOT_FOUND, sessionID); | ||
databaseName = sessionSvc.session().database().name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as careful here as in TransactionService, eh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's inside try
block and it would be a valid throw if the request didn't contain any of session
, database
, or its name
. That's what I thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but now if sessionSvc != null
, but database
for some reason is, sessionSvc.close()
won't be called. I don't know if that's even possible, but it is changing the behaviour in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I didn't pay enough attention to the business logic here, my bad. And having a throw
because of this line after everything would be dumb as well. Will be careful here as well then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's actually add a databaseName()
getter to the session service so that we can at least have one fewer null check in these places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments to address, but otherwise LGTM!
… the id generations
… scheduling. Add disabled reporting case
…hour whatever it takes
…en in JSON. Refactor
Usage and product changes
We introduce an updated version of diagnostics sent from a
TypeDB
server.config.yml
gets a new fielddeploymentID
for thediagnostics
section. This field should be used for collecting the data from multiple servers of a singleTypeDB Cloud
deployment.JSON
reporting, we calculated diffs between the current timestamp and thesinceTimestamp
(the previous hour when the data had to be sent: it's updated even if we had errors sending the data for simplicity). For thePrometheus
data, we send raw counts asPrometheus
calculates diffs based on its queries and expects raw diagnostics from our side.JSON
monitoring, we show only the incrementing counters from the start of the server just as for the Prometheus diagnostics data (also available through the monitoring page). This way, the content is different from the reporting data.schema
anddata
diagnostics about each specific database are sent only from the primary replica of a deployment at the moment of the diagnostics collection. Theconnection
peak values diagnostics regarding a database are still reported by a non-primary replica if the database exists or there were established transactions within the last hour before the database had been deleted.Example diagnostics data for Prometheus (http://localhost:4104/metrics?format=prometheus):
Example diagnostics JSON data from monitoring (http://localhost:4104/metrics?format=JSON):
Example of diagnostics JSON data sent when the reporting flag is turned on:
Example of diagnostics JSON data sent once when the reporting flag is turned off:
Implementation
There is no huge refactoring as it's planned to be a cleaner feature in the incoming 3.0.