Fix Grafana dashboards. #7121

tmacam · 2023-10-30T23:48:46Z

Description

The current grafana dashboards do not work in a fresh cluster where prometheus and grafana are installed using helm following Dapr Docs (see 1, 2). They refer to metrics that are not available in such install.

In short, based on bug-report from dapr/test-infra#204, the proposed fix can be summed by:

sed -i \
    -e 's/\bkubernetes_name\b/service/g' \
    -e 's/\bkubernetes_namespace\b/namespace/g' \
    -e 's/\bkubernetes_node\b/node/g' \
    -e 's/\bkubernetes_pod_name\b/pod/g' \
    *.json

Additionally:

Removes refresh rates smaller than 1 minute.
Sets default interval range to 14 days in the past to now
Sets default template values to match the longhaul clusters.

Issue reference

Please reference the issue this PR will close: #7120

Testing done

Imported proposed JSON files to longhaul environments. See dapr/test-infra#167

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

Code compiles correctly
Created/updated tests
Unit tests passing
End-to-end tests passing
Extended the documentation / Created issue in the https://github.com/dapr/docs/ repo: dapr/docs#[issue number]
Specification has been updated / Created issue in the https://github.com/dapr/docs/ repo: dapr/docs#[issue number]
Provided sample for the feature / Created issue in the https://github.com/dapr/docs/ repo: dapr/docs#[issue number]

codecov · 2023-10-31T00:05:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5e9dc0c) 65.04% compared to head (a98f3f1) 64.98%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7121      +/-   ##
==========================================
- Coverage   65.04%   64.98%   -0.07%     
==========================================
  Files         221      221              
  Lines       20799    20799              
==========================================
- Hits        13529    13516      -13     
- Misses       6133     6143      +10     
- Partials     1137     1140       +3

see 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tmacam · 2023-10-31T21:41:52Z

Gathering all diffs from dapr/test-infra#204 we got to the following regex:

sed -i \
    -e 's/\bkubernetes_name\b/service/g' \
    -e 's/\bkubernetes_namespace\b/namespace/g' \
    -e 's/\bkubernetes_node\b/node/g' \
    -e 's/\bkubernetes_pod_name\b/pod/g' \
    *.json

mukundansundar

why is the grafana dashboard only limited to longhaul_test? And why are the service names hardcoded?
Isn't this a general grafana dashboard that can be optionally installed on any dapr cluster?
cc @dapr/maintainers-dapr ?

mukundansundar · 2023-11-02T05:58:48Z

grafana/grafana-actor-dashboard.json

+          "text": "longhaul-test",
+          "value": "longhaul-test"


why is this specific to longhaul_test

It doesn't have to be specific to longhauls, but if it is going to be an arbitrary string, why not have it be something useful?

While I think it is safe to keep these longhaul settings for the reasons I mentioned before, I added a98f3f1 to remove those specific settings.

tmacam · 2023-11-02T16:50:55Z

@mukundansundar

why is the grafana dashboard only limited to longhaul_test?

It isn't. This PR accomplishes 2 things:

It fixes the names of the metrics being used so they refer to ones that exist on a default prometheus setup
sets defaults to so they work out-of-the-box with the longhaul-tests environments.

The last item is the contentious one so let me dive on it. While the last item might make you believe that we are limiting it to longhaul-tests, this isn't really the case.

First because grafana pics up the contents selectable for the variables namespace (which was a bad name, should prob be renamed as it collides with a metric name) and dapr_app_id from their associated queries. Upon loading this dashboard, the available options for these two variables will be collected by the available values set to the metrics they track. The user has only to manually select something for these two fields. That will be required anyway, both with the current code (unless applications are also run in a namespace called pipeline ) or after this PR (unless applications are also run in a namespace called longhaul-test).

Second, you might observe that there already are hardcoded value for these 2 these variables: namespace is set to pipeline and dapr_app_id is set to ["loadtestclient", "stateactor"]. So effectively we are just changing an arbitrary setting (that doesn't work out of the box and that doesn't refer to anything meaningful to the project) by another arbitrary setting so that it could work out of the box for the project.

And why are the service names hardcoded?

So that, upon install in a fresh longhaul cluster + grafana install, all apps existing are selected by default. I am just saving us a few clicks.

Isn't this a general grafana dashboard that can be optionally installed on any dapr cluster?

My point is that it is just as generic as it was before. Upon loading, those variables whose values we are pre-feeding will be gathered by the metric queries associated with them.

The current grafana dashboards do not work in a fresh cluster where prometheus and grafana are installed using helm following Dapr Docs (see [1], [2]). They refer to metrics that are not available in such install. In short, based on bug-report from dapr/test-infra#204, the proposed fix can be summed by: ```bash sed -i \ -e 's/\bkubernetes_name\b/service/g' \ -e 's/\bkubernetes_namespace\b/namespace/g' \ -e 's/\bkubernetes_node\b/node/g' \ -e 's/\bkubernetes_pod_name\b/pod/g' \ *.json ``` Additionally: * Removes refresh rates smaller than 1 minute. * Sets default interval range to 14 days in the past to now * Sets default template values to match the longhaul clusters. Fixes dapr#7120 [1]: https://docs.dapr.io/operations/observability/metrics/prometheus/#setup-prometheus-on-kubernetes [2]: https://docs.dapr.io/operations/observability/metrics/grafana/#setup-on-kubernetes Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>

Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>

tmacam force-pushed the FixGrafana branch from f4c0b64 to 21b5034 Compare October 31, 2023 22:09

tmacam mentioned this pull request Oct 31, 2023

Move longhaul to the same Azure subscription we use for E2E tests dapr/test-infra#167

Closed

tmacam marked this pull request as ready for review October 31, 2023 22:35

tmacam requested review from a team as code owners October 31, 2023 22:35

mukundansundar requested changes Nov 2, 2023

View reviewed changes

tmacam added 2 commits November 3, 2023 12:34

Remove longhaul related settings.

a98f3f1

Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>

tmacam force-pushed the FixGrafana branch from 21b5034 to a98f3f1 Compare November 3, 2023 23:45

mukundansundar approved these changes Nov 4, 2023

View reviewed changes

mukundansundar added automerge Allows DaprBot to automerge and update PR if all approvals are in place autoupdate DaprBot will keep the Pull Request up to date with master branch labels Nov 4, 2023

mukundansundar merged commit c67ab59 into dapr:master Nov 4, 2023
31 of 32 checks passed

tmacam deleted the FixGrafana branch November 7, 2023 00:58

JoshVanL added this to the v1.13 milestone Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Grafana dashboards. #7121

Fix Grafana dashboards. #7121

tmacam commented Oct 30, 2023 •

edited

codecov bot commented Oct 31, 2023 •

edited

tmacam commented Oct 31, 2023 •

edited

mukundansundar left a comment

mukundansundar Nov 2, 2023

tmacam Nov 2, 2023 •

edited

tmacam Nov 3, 2023

tmacam commented Nov 2, 2023

Fix Grafana dashboards. #7121

Fix Grafana dashboards. #7121

Conversation

tmacam commented Oct 30, 2023 • edited

Description

Issue reference

Testing done

Checklist

codecov bot commented Oct 31, 2023 • edited

Codecov Report

tmacam commented Oct 31, 2023 • edited

mukundansundar left a comment

Choose a reason for hiding this comment

mukundansundar Nov 2, 2023

Choose a reason for hiding this comment

tmacam Nov 2, 2023 • edited

Choose a reason for hiding this comment

tmacam Nov 3, 2023

Choose a reason for hiding this comment

tmacam commented Nov 2, 2023

tmacam commented Oct 30, 2023 •

edited

codecov bot commented Oct 31, 2023 •

edited

tmacam commented Oct 31, 2023 •

edited

tmacam Nov 2, 2023 •

edited