Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new replica lag metric #16415

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Pankaj260100
Copy link
Contributor

@Pankaj260100 Pankaj260100 commented May 8, 2024

Description

  • Adding a new replica lag metric along with taskId, datasource, and stream dimensions.
Let's say the offsets of a task group with 3 partitions are:
replica0 : {0: 100, 1: 200, 3: 300}
replica1: {0: 120, 1: 180, 3: 250}
The lag for replica0 is (120 - 100) + 0  + 0 = 20
The lag for replica1 is 0 + (200 - 180) + (300 - 250) = 70

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula
Copy link
Contributor

@Pankaj260100

  1. This metric might not be of interest for every user. Especially those operating with exactly 1 replica. Could you please add a config in the supervisor to allow or disable emission of this metric? (Alternatively, emit this only when there are multiple replicas)
  2. Please add relevant documentation and modify the PR description appropriately.
  3. It would be great if you could try adding this metric for all the streaming supervisors as a generic method.

@cryptoe
Copy link
Contributor

cryptoe commented May 12, 2024

This metric might not be of interest for every user. Especially those operating with exactly 1 replica. Could you please add a config in the supervisor to allow or disable emission of this metric? (Alternatively, emit this only when there are multiple replicas)

@AmatyaAvadhanula If you think this metric is useful, lets enable it for all. If its not useful, then we should not add this. Adding more context parameter to supervisor spec seems another turning knob which we want to avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants