-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Skip calculate lag when a topic has no offset commits #718
Conversation
The current code is deliberate. If a consumer in a group is assigned a topic, then if that topic/partition has no commit, it means the entire topic/partition is lagged. The original implementation did not consider topics with no commits as candidates to have lag. This was problematic because people were trying to calculate lag for a group that was bugged and never actually committed. The lag showed as zero, but technically by definition of how lag is calculated, the calculation should have shown the entire topic lagging. |
We have too may topics that cleanup.policy=delete, so the topics will have zero messages after some time, when a consumer starts, it will never commit offset because no messages consumed, so the current code will calculate a very big lag number other than zero. So how can i fix this scenario? Should i implement the calculation myself like kowl did? Maybe we can get partition water marks to get how many messages in topic, but this method requires more requests to kafka server. |
We've actually had multiple reports of Kowl's (now Console) lag page being inaccurate -- for basically the reason that I do it differently in the code above. However, I think given what you're saying, a different fix is to subtract the log start offset from the lag. If a topic has log end offset 30 and log start offset 30 and no commit, I think today it'll show a lag of 30, whereas this can be changed to show a lag of 0. |
…on-zero This adds one more round trip in the `Lag` function to request start offsets, but this improves the case when topics have segments deleted over time. We _could_ go through some detailed way of avoiding requests for partitions that have commits, but having the start offset is pretty valuable a lot of time time anyway -- and we keep the logic simpler. Now, * if a topic is assigned to a group, * and a partition in that topic has no commits, * and the log-start-offset for the partition is non-zero Lag will no longer show the entire partition as lagging from zero, but instead will show more accurately the log-end-offset - log-start-offset. If LEO == LSO, the lag is now zero, rather than LEO itself. This also no longer says lag is negative if a commit is past the end of a partition; instead the lag is zero. Closes #718.
…on-zero This adds one more round trip in the `Lag` function to request start offsets, but this improves the case when topics have segments deleted over time. We _could_ go through some detailed way of avoiding requests for partitions that have commits, but having the start offset is pretty valuable a lot of time time anyway -- and we keep the logic simpler. Now, * if a topic is assigned to a group, * and a partition in that topic has no commits, * and the log-start-offset for the partition is non-zero Lag will no longer show the entire partition as lagging from zero, but instead will show more accurately the log-end-offset - log-start-offset. If LEO == LSO, the lag is now zero, rather than LEO itself. This also no longer says lag is negative if a commit is past the end of a partition; instead the lag is zero. Closes #718.
Check out #744, I think that solves this. I'm going to release kadm soon. |
If topic has no commit offset for a consumer group, then we should not calculate lag for it.
For example, topic A current high water offset is 100, but has no message because of delete mode
And there will not be any offset commits for consumer group because there is no messages to consume
Function CalculateGroupLag will calculate lag 100 for the consumer group, which is wrong.