Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better estimate for SQS time to drain metrics #390

Open
r0b0ji opened this issue Jul 4, 2023 · 1 comment
Open

Better estimate for SQS time to drain metrics #390

r0b0ji opened this issue Jul 4, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@r0b0ji
Copy link

r0b0ji commented Jul 4, 2023

Version

v5.2.3

Steps and/or minimal code example to reproduce

It is not actually a bug but a better and simpler computation exist. Currently, time to drain metrics in SQS is calculated as below [1] , which is indirect. A better estimate can be calculated using RATE function [2].

  1. metricTimeToDrain() {
    return this.metricFactory.createMetricMath(
    "(visibleMessages / (consumptionVolume - incomingVolume)) * (PERIOD(consumptionVolume))",
    {
    visibleMessages: this.metricApproximateVisibleMessageCount(),
    incomingVolume: this.metricIncomingMessageCount(),
    consumptionVolume: this.metricDeletedMessageCount(),
    },
    "Time to Drain (seconds) (avg: ${AVG}, max: ${MAX})"
    );
    }
  2. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html

Expected behavior

Instead of directly getting the consumption rate, current computation estimate based on different metrics which is less accurate.

Actual behavior

A better and direct method can be used.

Other details

A sample code for this is

{
    "metrics": [
        [ { "expression": "m1/ABS(RATE(m1))", "label": "TimeToDrain (sec)", "id": "e1", "region": "us-east-1" } ],
        [ "AWS/SQS", "ApproximateNumberOfMessagesVisible", "QueueName", "some-test-queue", { "id": "m1", "visible": false, "region": "us-east-1" } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "region": "us-east-1",
    "stat": "Average",
    "period": 300
}
@r0b0ji r0b0ji added the bug Something isn't working label Jul 4, 2023
@r0b0ji
Copy link
Author

r0b0ji commented Jul 4, 2023

Also, in the original formula the absolute value of diff need to be taken to avoid getting negative rate impacting the avg and other stats for Time to drain metric. Time to drain can't be negative, if there is no message it will be 0 but current formula adds negative datapoints (though the visibility is capped at 0 min but datapoint are still negative) and which reduces the avg .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant