Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more prometheus labels (including area, device) #113849

Open
wants to merge 47 commits into
base: dev
Choose a base branch
from

Conversation

jzucker2
Copy link

@jzucker2 jzucker2 commented Mar 19, 2024

Proposed change

I was inspired by a closed pull request here: #85553 to add some more labels

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

Copy link

@home-assistant home-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jzucker2

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@home-assistant
Copy link

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@home-assistant home-assistant bot marked this pull request as draft March 19, 2024 23:00
@home-assistant
Copy link

Hey there @knyar, mind taking a look at this pull request as it has been labeled with an integration (prometheus) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of prometheus can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign prometheus Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@jzucker2
Copy link
Author

Signed!

Hi @jzucker2

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

Signed!

Copy link
Contributor

@knyar knyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Most of the tests check label values, so you will need to adjust those as well.

@@ -345,6 +361,11 @@ def _labels(state: State) -> dict[str, Any]:
"entity": state.entity_id,
"domain": state.domain,
"friendly_name": state.attributes.get(ATTR_FRIENDLY_NAME),
"area": state.attributes.get(ATTR_AREA_ID),
# TODO: how to get zone here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading a bit more about zones, they seem to be a geofencing concept, with only person and device_tracker entities supporting zones.

So perhaps we should not add zone labels for all metrics & entities.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed zone for now. We can address that in another pr (if the need arises, I'm not using it currently)

"area": state.attributes.get(ATTR_AREA_ID),
# TODO: how to get zone here?
# "zone": state.attributes.get(ATTR_AREA_ID),
"entity_id": state.object_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is called object_id internally, could you please use object_id for the label name as well?

@@ -345,6 +361,11 @@ def _labels(state: State) -> dict[str, Any]:
"entity": state.entity_id,
"domain": state.domain,
"friendly_name": state.attributes.get(ATTR_FRIENDLY_NAME),
"area": state.attributes.get(ATTR_AREA_ID),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the attribute is unset, the label value should be empty rather than the string None.

Suggested change
"area": state.attributes.get(ATTR_AREA_ID),
"area": state.attributes.get(ATTR_AREA_ID) or "",

# TODO: how to get zone here?
# "zone": state.attributes.get(ATTR_AREA_ID),
"entity_id": state.object_id,
"device_class": state.attributes.get(ATTR_DEVICE_CLASS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if device class can be unset, but just in case it is:

Suggested change
"device_class": state.attributes.get(ATTR_DEVICE_CLASS),
"device_class": state.attributes.get(ATTR_DEVICE_CLASS) or "",

.gitignore Outdated Show resolved Hide resolved
.dockerignore Outdated Show resolved Hide resolved
@jzucker2 jzucker2 marked this pull request as ready for review March 29, 2024 20:52
@jzucker2 jzucker2 requested a review from a team as a code owner March 29, 2024 20:52
@jzucker2
Copy link
Author

Hi! Thank you for the earlier feedback.

I added the labels (except for zone) and also updated the unit tests. It ended up being a huge effort to update the unit tests, they are unintentionally brittle for label changes. I refactored things into a helper class. I know that kind of reduces the legibility of a single test and abstracts a bit, but it should make other label changes much easier going forward.

As of result of finding the helper necessary, I did add some simple smoke tests for the helper.
I also do think the helper class could be cleaner, but it does work. And I was hoping to propose some other changes going forward specifically around prometheus and could clean it up then.

Let me know what you think! It's exciting to try and contribute back to a project I use so much.

@MartinHjelmare MartinHjelmare removed the request for review from a team March 29, 2024 23:03
Copy link
Contributor

@knyar knyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the labels (except for zone) and also updated the unit tests. It ended up being a huge effort to update the unit tests, they are unintentionally brittle for label changes. I refactored things into a helper class. I know that kind of reduces the legibility of a single test and abstracts a bit, but it should make other label changes much easier going forward.

Thank you!

I don't mind refactoring tests, but I would advise against combining such refactoring in a single PR with other changes.

At the moment it's hard to see the impact of the code change you have made, because all tests have changed as well. Coupling two changes like this makes reviewing (and potentially reverting) each of them more difficult.

May I suggest splitting test refactoring into a separate PR? I don't mind which of the two you'd like to discuss first. I shared some feedback on the specifics of the test refactoring here, and will be happy to discuss more in a separate issue or PR.

@@ -344,6 +359,9 @@ def _labels(state: State) -> dict[str, Any]:
"entity": state.entity_id,
"domain": state.domain,
"friendly_name": state.attributes.get(ATTR_FRIENDLY_NAME),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're changing all the metrics, we should probably fix friendly_name="None":

Suggested change
"friendly_name": state.attributes.get(ATTR_FRIENDLY_NAME),
"friendly_name": state.attributes.get(ATTR_FRIENDLY_NAME) or "",

@@ -116,6 +119,25 @@ async def generate_latest_metrics(client):
return body


@pytest.mark.parametrize("namespace", [""])
async def test_metrics_labels_list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test checks that two lists defined in two different modules have the same values. Do you have a bug or a failure condition in mind that would be caught by this test and not by any of the other tests we have?

Comment on lines +168 to +174
MetricsTestHelper._perform_sensor_metric_assert(
"homeassistant_sensor_temperature_celsius",
"12.3",
"Outside Temperature",
"outside_temperature",
body,
device_class=SensorDeviceClass.TEMPERATURE,
Copy link
Contributor

@knyar knyar Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the diff here, I think for an individual assertion the new version is less legible and clear.

The previous version was very simple: it confirmed that a particular string was present in a list.

In the new version, usage of positional arguments makes it difficult to understand what those arguments are. Perhaps the helpers should accept a dict with label keys & values, to make it more obvious what is being checked? E.g.:

assert_metric_present(body, name="homeassistant_sensor_temperature_celsius", value=12.3, labels=dict(
  entity="sensor.outside_temperature",
  domain="sensor",
  friendly_name="Outside Temperature",
  object_id="outside_temperature",
  device_class="temperature",
))

"Radio Energy",
"radio_energy",
body,
device_class=SensorDeviceClass.POWER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use strings for checking label values instead of referring to constants in other modules?

"""Helper methods for prometheus tests."""


class MetricsTestHelper:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this class? It seems like all of its methods can just be separate functions in this module.

device_class=device_class,
metric_value=metric_value,
)
if positive_comparison:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more convenient to provide separate assertion functions (e.g. assert_metric_present vs assert_metric_absent) rather than make the user set a bool.

device_class=None,
):
device_class_label_line = cls.get_device_class_label_line(device_class)
final_metric_value = f" {metric_value}" if metric_value else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment about positive_comparison, perhaps assertions without a value should be handled by separate functions.

'humidifier_target_humidity_percent{domain="humidifier",'
'entity="humidifier.humidifier",'
'friendly_name="Humidifier"} 68.0' in body
MetricsTestHelper._perform_humidifier_metric_assert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These device_class specific assertion methods seem like an unnecessary layer of abstraction. The _perform_humidifier_metric_assert is only used a handful of times here - it might be simpler to use a generic function, or have a helper function that's defined closer to where it's used.

@jzucker2
Copy link
Author

jzucker2 commented Apr 8, 2024

I added the labels (except for zone) and also updated the unit tests. It ended up being a huge effort to update the unit tests, they are unintentionally brittle for label changes. I refactored things into a helper class. I know that kind of reduces the legibility of a single test and abstracts a bit, but it should make other label changes much easier going forward.

Thank you!

I don't mind refactoring tests, but I would advise against combining such refactoring in a single PR with other changes.

At the moment it's hard to see the impact of the code change you have made, because all tests have changed as well. Coupling two changes like this makes reviewing (and potentially reverting) each of them more difficult.

May I suggest splitting test refactoring into a separate PR? I don't mind which of the two you'd like to discuss first. I shared some feedback on the specifics of the test refactoring here, and will be happy to discuss more in a separate issue or PR.

I'm willing to split this into 2 prs and keep test refactoring and logic changes separate. How about let's focus on test refactoring first, because simply adding "area" as a label to prometheus will break every single test and I'd love to address that first (I'm also interested in a later pass to include the new tagging and labeling features in Home Assistant in prometheus metrics as well, but let's discuss that later).

Maybe I can make this PR the test refactoring PR since most of the work is around that. And I break out the logic changes into a follow up PR?

Thanks so much for the responses so far! I'm really looking forward to having these features in my Home Assistant set up.

@knyar
Copy link
Contributor

knyar commented Apr 10, 2024

Maybe I can make this PR the test refactoring PR since most of the work is around that. And I break out the logic changes into a follow up PR?

That sounds good! I've left a few comments about tests in my previous message. In general I think we should be careful about over-abstraction here: the purpose of these tests is to verify that exported metrics have the format that we expect, and to allow folks to understand and debug them I would suggest erring on the side of keeping them simple & explicit even if they seem verbose.

Do you have a list of goals for this refactoring or problems you've identified and want to fix? I think this might be helpful to describe (even if at a high level).

For example, if you find the current approach of hand-crafting full metric strings annoying, I think an alternative might be to parse metric output and examine it in a more structured way.

One thing that I find pretty inconvenient in our current testing approach is that most tests are broken into two functions: a fixture that creates Home Assistant entities, and the test function itself. These two functions are located in different parts of the same file, making it difficult to understand and modify a single test. Many fixtures are only used once. Perhaps moving fixtures closer to the test functions, or maybe combining them in some cases would be simpler? Not sure what's common here in Home Assistant ecosystem - I've not looked at how other components structure their testing, but I think it's worth examining and trying to be consistent with if we are making large changes here.

@jzucker2
Copy link
Author

Maybe I can make this PR the test refactoring PR since most of the work is around that. And I break out the logic changes into a follow up PR?

That sounds good! I've left a few comments about tests in my previous message. In general I think we should be careful about over-abstraction here: the purpose of these tests is to verify that exported metrics have the format that we expect, and to allow folks to understand and debug them I would suggest erring on the side of keeping them simple & explicit even if they seem verbose.

Do you have a list of goals for this refactoring or problems you've identified and want to fix? I think this might be helpful to describe (even if at a high level).

For example, if you find the current approach of hand-crafting full metric strings annoying, I think an alternative might be to parse metric output and examine it in a more structured way.

One thing that I find pretty inconvenient in our current testing approach is that most tests are broken into two functions: a fixture that creates Home Assistant entities, and the test function itself. These two functions are located in different parts of the same file, making it difficult to understand and modify a single test. Many fixtures are only used once. Perhaps moving fixtures closer to the test functions, or maybe combining them in some cases would be simpler? Not sure what's common here in Home Assistant ecosystem - I've not looked at how other components structure their testing, but I think it's worth examining and trying to be consistent with if we are making large changes here.

So, long term goal: I wanted to add 2/3 labels here and ended up needing to adjust ~800 lines of tests. But the tests themselves are pretty good. So just hoping that we can have test helpers that are still legible and human-readable but flexible enough to make prometheus label changes reasonable going forward.

I do agree that I overcomplicated this first pass. I do see what you mean about fixtures and stuff.

I also saw your comment about following patterns elsewhere-- it seems this project is so big there's not exactly one way of doing things. But this approach did seem to make sense for the most part, except I definitely think I over-abstracted the refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants