Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Support for Backlog Counter #5915

Merged
merged 16 commits into from May 20, 2024

Conversation

Shivs11
Copy link
Contributor

@Shivs11 Shivs11 commented May 13, 2024

What changed?

Support for the backlog counter, which was added in #5593, results in Cassandra persisting the backlog counts when tasks are created. However, the same is not being done by SQL databases as we don't persist task queue related information while creating tasks. Thus, if we fetch information immediately after we have created tasks, the backlog counter for SQL databases would read 0 even when we do have tasks in our backlog.

To fix this, a Count(*) query has been implemented whenever we fetch task queues.

Why?

This is required for the backlog counter accurate for both Postgres and SQL databases.

How did you test it?

  • Integration tests testing the functionality have been added. These verify if things on the postgres end also work as expected.
  • Existing suite of test cases.

Potential risks

None, since the backlog counter work has not been released yet.

Is hotfix candidate?

No

if err != nil {
return nil, err
}
taskQueueInfo.ApproximateBacklogCount = max(taskQueueInfo.ApproximateBacklogCount, int64(numTasks))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Cassandra: numTasks would be 0 and the actual backlog value would be inside of taskQueueInfo.ApproximateBacklogCount.

For SQL: Vice versa.

@Shivs11 Shivs11 marked this pull request as ready for review May 13, 2024 22:56
@Shivs11 Shivs11 requested a review from a team as a code owner May 13, 2024 22:56
@Shivs11 Shivs11 requested review from ShahabT, dnr and carlydf May 13, 2024 22:56
common/persistence/persistence_interface.go Outdated Show resolved Hide resolved
common/persistence/task_manager.go Outdated Show resolved Hide resolved
@@ -475,4 +475,4 @@ func makeCacheKey(
WorkflowKey: definition.NewWorkflowKey(namespaceID.String(), execution.GetWorkflowId(), execution.GetRunId()),
ShardUUID: shardContext.GetOwner(),
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running make goimports; make lint was complaining by requiring me to include this.

@Shivs11
Copy link
Contributor Author

Shivs11 commented May 16, 2024

Note : Discussions with @dnr and @ShahabT reached a consensus which requires a config to be added in order to allow customers to trigger the option of calling Count(*) or not. This shall be added *after this branch merges into shivam/backlog-count-updated and once this branch is looking to be merging into main. This is because of the fact that the current branch is not up-to-date with main, which has undergone config-changes recently. Thus, to avoid repetitive work, this shall be done when merging the feature into main.

service/matching/backlog_manager.go Outdated Show resolved Hide resolved
common/persistence/persistence_interface.go Outdated Show resolved Hide resolved
return int64(tasksPresent), nil
}

func (c *backlogManagerImpl) getApproximateBacklogCount(ctx context.Context) (int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think above this method we should describe the situations that we may under or over count. (when CountTasksExact is not used). These situations comes to my mind:

  • Overcount: backlog manager is killed without having the chance to persist latest ack level is
  • Undercount: ownership transfers and the old owner writes more tasks to backlog between the GetTaskQueue and UpdateTaskQueue calls of the new owner lease takeover.
  • Undercount: the scenario which moving ackLevel to db would have solved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for point (1), you mean the situation in which Cassandra randomly TTL'es out tasks which in-turn leads to the counter being an overestimate right?

@Shivs11 Shivs11 requested a review from ShahabT May 17, 2024 15:31
@Shivs11 Shivs11 merged commit 4c03b85 into shivam/backlog-count-updated May 20, 2024
41 checks passed
@Shivs11 Shivs11 deleted the shivam_SQL_Count_Support branch May 20, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants