-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL Support for Backlog Counter #5915
SQL Support for Backlog Counter #5915
Conversation
common/persistence/task_manager.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
taskQueueInfo.ApproximateBacklogCount = max(taskQueueInfo.ApproximateBacklogCount, int64(numTasks)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Cassandra: numTasks
would be 0
and the actual backlog value would be inside of taskQueueInfo.ApproximateBacklogCount
.
For SQL: Vice versa.
12883be
to
511b017
Compare
@@ -475,4 +475,4 @@ func makeCacheKey( | |||
WorkflowKey: definition.NewWorkflowKey(namespaceID.String(), execution.GetWorkflowId(), execution.GetRunId()), | |||
ShardUUID: shardContext.GetOwner(), | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running make goimports; make lint
was complaining by requiring me to include this.
… test to verify the workings of this
Note : Discussions with @dnr and @ShahabT reached a consensus which requires a config to be added in order to allow customers to trigger the option of calling |
service/matching/backlog_manager.go
Outdated
return int64(tasksPresent), nil | ||
} | ||
|
||
func (c *backlogManagerImpl) getApproximateBacklogCount(ctx context.Context) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think above this method we should describe the situations that we may under or over count. (when CountTasksExact is not used). These situations comes to my mind:
- Overcount: backlog manager is killed without having the chance to persist latest ack level is
- Undercount: ownership transfers and the old owner writes more tasks to backlog between the GetTaskQueue and UpdateTaskQueue calls of the new owner lease takeover.
- Undercount: the scenario which moving ackLevel to db would have solved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for point (1), you mean the situation in which Cassandra randomly TTL'es out tasks which in-turn leads to the counter being an overestimate right?
What changed?
Support for the backlog counter, which was added in #5593, results in Cassandra persisting the backlog counts when tasks are created. However, the same is not being done by SQL databases as we don't persist task queue related information while creating tasks. Thus, if we fetch information immediately after we have created tasks, the backlog counter for SQL databases would read
0
even when we do have tasks in our backlog.To fix this, a Count(*) query has been implemented whenever we fetch task queues.
Why?
This is required for the backlog counter accurate for both Postgres and SQL databases.
How did you test it?
Potential risks
None, since the backlog counter work has not been released yet.
Is hotfix candidate?
No