Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify tracking of task counts stats in TaskQueue #16423

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented May 9, 2024

Description

The current model of tracking task counts for the TaskCountStatsMonitor is too limiting and verbose.

Changes

  • Remove deprecated methods from TaskCountStatsProvider
  • Rename CoordinatorRunStats to DruidRunStats as it is now being extensively used outside the coordinator.
  • Simplify collection of stats in TaskQueue.
    • Remove methods getRunningTaskCount, getWaitingTaskCount, etc.
    • Collect all stats in a single DruidRunStats instance which is reset upon invocation of getQueueStats()
  • Add dimension taskType to task count metrics

Classes to review

  • TaskQueue
  • TaskCountStatsProvider
  • TaskCountStatsMonitor
  • TaskMaster

Further work

  • There are several other count stats providers which may benefit from a similar cleanup. I will try to consolidate all of them in a follow up PR.

Design considerations

The few advantages of the previous design were that:

  • The metrics emitted by the monitor were clearly listed out in the monitor itself. This can still be retained by adding logic in the monitor to filter out the metrics contained in the DruidRunStats.
  • The individual methods in the interface made it necessary for the concrete implementation to provide those metrics. But with a plain getStats() method which returns a packaged DruidRunStats, it is possible for the concrete implementation to return any or no metrics inside the DruidRunStats. Still thinking of a clean way to enforce this. I suppose we could add logic in the monitor to ensure that all required metrics have been sent by the implementation.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz kfaraz requested a review from abhishekrb19 May 9, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant