Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Grafana dashboards. #7121

Merged
merged 2 commits into from Nov 4, 2023
Merged

Fix Grafana dashboards. #7121

merged 2 commits into from Nov 4, 2023

Conversation

tmacam
Copy link
Contributor

@tmacam tmacam commented Oct 30, 2023

Description

The current grafana dashboards do not work in a fresh cluster where prometheus and grafana are installed using helm following Dapr Docs (see 1, 2). They refer to metrics that are not available in such install.

In short, based on bug-report from dapr/test-infra#204, the proposed fix can be summed by:

sed -i \
    -e 's/\bkubernetes_name\b/service/g' \
    -e 's/\bkubernetes_namespace\b/namespace/g' \
    -e 's/\bkubernetes_node\b/node/g' \
    -e 's/\bkubernetes_pod_name\b/pod/g' \
    *.json

Additionally:

  • Removes refresh rates smaller than 1 minute.
  • Sets default interval range to 14 days in the past to now
  • Sets default template values to match the longhaul clusters.

Issue reference

Please reference the issue this PR will close: #7120

Testing done

Imported proposed JSON files to longhaul environments. See dapr/test-infra#167

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5e9dc0c) 65.04% compared to head (a98f3f1) 64.98%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7121      +/-   ##
==========================================
- Coverage   65.04%   64.98%   -0.07%     
==========================================
  Files         221      221              
  Lines       20799    20799              
==========================================
- Hits        13529    13516      -13     
- Misses       6133     6143      +10     
- Partials     1137     1140       +3     

see 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tmacam
Copy link
Contributor Author

tmacam commented Oct 31, 2023

Gathering all diffs from dapr/test-infra#204 we got to the following regex:

sed -i \
    -e 's/\bkubernetes_name\b/service/g' \
    -e 's/\bkubernetes_namespace\b/namespace/g' \
    -e 's/\bkubernetes_node\b/node/g' \
    -e 's/\bkubernetes_pod_name\b/pod/g' \
    *.json

Copy link
Contributor

@mukundansundar mukundansundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the grafana dashboard only limited to longhaul_test? And why are the service names hardcoded?
Isn't this a general grafana dashboard that can be optionally installed on any dapr cluster?
cc @dapr/maintainers-dapr ?

Comment on lines 807 to 808
"text": "longhaul-test",
"value": "longhaul-test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this specific to longhaul_test

Copy link
Contributor Author

@tmacam tmacam Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be specific to longhauls, but if it is going to be an arbitrary string, why not have it be something useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think it is safe to keep these longhaul settings for the reasons I mentioned before, I added a98f3f1 to remove those specific settings.

@tmacam
Copy link
Contributor Author

tmacam commented Nov 2, 2023

@mukundansundar

why is the grafana dashboard only limited to longhaul_test?

It isn't. This PR accomplishes 2 things:

  1. It fixes the names of the metrics being used so they refer to ones that exist on a default prometheus setup
  2. sets defaults to so they work out-of-the-box with the longhaul-tests environments.

The last item is the contentious one so let me dive on it. While the last item might make you believe that we are limiting it to longhaul-tests, this isn't really the case.

First because grafana pics up the contents selectable for the variables namespace (which was a bad name, should prob be renamed as it collides with a metric name) and dapr_app_id from their associated queries. Upon loading this dashboard, the available options for these two variables will be collected by the available values set to the metrics they track. The user has only to manually select something for these two fields. That will be required anyway, both with the current code (unless applications are also run in a namespace called pipeline ) or after this PR (unless applications are also run in a namespace called longhaul-test).

Second, you might observe that there already are hardcoded value for these 2 these variables: namespace is set to pipeline and dapr_app_id is set to ["loadtestclient", "stateactor"]. So effectively we are just changing an arbitrary setting (that doesn't work out of the box and that doesn't refer to anything meaningful to the project) by another arbitrary setting so that it could work out of the box for the project.

And why are the service names hardcoded?

So that, upon install in a fresh longhaul cluster + grafana install, all apps existing are selected by default. I am just saving us a few clicks.

Isn't this a general grafana dashboard that can be optionally installed on any dapr cluster?

My point is that it is just as generic as it was before. Upon loading, those variables whose values we are pre-feeding will be gathered by the metric queries associated with them.

The current grafana dashboards do not work in a fresh cluster where
prometheus and grafana are installed using helm following Dapr Docs
(see [1], [2]). They refer to metrics that are not available in
such install.

In short, based on bug-report from dapr/test-infra#204, the proposed
fix can be summed by:

```bash
sed -i \
    -e 's/\bkubernetes_name\b/service/g' \
    -e 's/\bkubernetes_namespace\b/namespace/g' \
    -e 's/\bkubernetes_node\b/node/g' \
    -e 's/\bkubernetes_pod_name\b/pod/g' \
    *.json
```

Additionally:

* Removes refresh rates smaller than 1 minute.
* Sets default interval range to 14 days in the past to now
* Sets default template values to match the longhaul clusters.

Fixes dapr#7120

[1]: https://docs.dapr.io/operations/observability/metrics/prometheus/#setup-prometheus-on-kubernetes
[2]: https://docs.dapr.io/operations/observability/metrics/grafana/#setup-on-kubernetes

Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>
Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>
@mukundansundar mukundansundar added automerge Allows DaprBot to automerge and update PR if all approvals are in place autoupdate DaprBot will keep the Pull Request up to date with master branch labels Nov 4, 2023
@mukundansundar mukundansundar merged commit c67ab59 into dapr:master Nov 4, 2023
31 of 32 checks passed
@tmacam tmacam deleted the FixGrafana branch November 7, 2023 00:58
@JoshVanL JoshVanL added this to the v1.13 milestone Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Allows DaprBot to automerge and update PR if all approvals are in place autoupdate DaprBot will keep the Pull Request up to date with master branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grafana dashboards do not work on a fresh helm install
3 participants