Keep observation data valid for 60 min and retry with no data for nws #117109

MatthewFlamm · 2024-05-08T19:57:49Z

Proposed change

Retry when no data is returned during an update. Use a custom DataUpdateCoordinator to mark data as unavailable after 60 minutes for observations.

Unlike recent PRs for nws, this one should go in a new minor release.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

home-assistant · 2024-05-08T19:57:55Z

Hey there @kamiyo, mind taking a look at this pull request as it has been labeled with an integration (nws) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of nws can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign nws Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

bdraco · 2024-05-08T20:03:26Z

Just to be certain, the intent is to NOT tag this for a patch release?

MatthewFlamm · 2024-05-08T20:06:07Z

I mean it could, but this one is quality of life upgrade only. Which is why I intended it to do in a new monthly release. If you feel it can go in a patch, Im okay with that too.

Edit: the others were definitely fixes to existing issues and were more critical. This one just adds some nice to have feature.

bdraco · 2024-05-08T20:09:41Z

I'm seeing unavailable after updating, but it may be because I've restarted so many times and I'm now rate limted

Will check it again later tonight

MatthewFlamm · 2024-05-08T20:16:18Z

Getting these myself, which shows the usefulness of this addition. I think I need to make this draft for now anyway as the errors could be handled better in the coordinator:

[traceback shortened]
 return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/python/lib/python3.12/concurrent/futures/_base.py", li
ne 401, in __get_result
    raise self._exception
  File "/mnt/data/home-assistant/venv/lib/python3.12/site-packages/
tenacity/_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/home-assistant/venv/lib/python3.12/site-packages/
pynws/simple_nws.py", line 262, in update_observation
    raise NwsNoDataError("Observation received with no data.")
pynws.nws.NwsNoDataError: Observation received with no data.

bdraco · 2024-05-08T20:39:06Z

It did recover after a few minutes. The missing data for PHOG is not a regression.

MatthewFlamm · 2024-05-08T20:40:15Z

I see that is common to catch custom errors and wrap with UpdateFailed. Ironically, the most straightforward way to put this into nws integration is to get rid of the functools.partial usage.

MatthewFlamm · 2024-05-08T20:41:37Z

It did recover after a few minutes. The missing data for PHOG is not a regression.

Now that I have a potential addition for catching the error better, the server goes back up!

MatthewFlamm · 2024-05-08T23:47:52Z

Error in log is much better now:

(MainThread) [homeassistant.component
s.nws] Error fetching NWS observation station PAFM data: No data re
turned.

bdraco · 2024-05-09T12:44:09Z

After restart. Hourly is there, but all other data is unavailable

Will check it again in a bit

MatthewFlamm · 2024-05-09T19:51:43Z

Looks okay to me on my side...

bdraco · 2024-05-11T23:26:42Z

I've been running it for a few days now and getting drop outs. Checkout the logbook on the right of the below screenshots. Let me know if any logging I can enable would be helpful

MatthewFlamm · 2024-05-12T00:08:59Z

Hmmm. I did find a bug in pynws that would interact with this PR, but it is only for the forecasts and not the observations that you are showing. The bug only means that this PR would cause the caching of the forecast data to get overwritten when no data is returned by NWS. This would result in behavior similar to dev branch. I will fix this and bump here as we are already bumping it here. Will mark as draft.

This looks like NWS server flakiness to me, which tends to fluctuate over time. Some time periods it is solid, some time periods there are frequent drop outs. The only way to find out with the current logging is to monitor the API calls and data. Set pynws logger to debug and you will get all API calls and data returned. It is a lot of data you will log. I would look for 500 errors in the observation coordinator refresh, usually 502, and when you get [] returned (this PR catches this for restarts).

If these drop outs are for no data, then in the current dev branch, you would have had unknown instead of unavailable.

MatthewFlamm · 2024-05-13T19:07:12Z

pynws version bumped again to fix forecasts caching particularly with retries. Added PR release notes to description. This won't affect the lack of data in observations as noted above.

bdraco · 2024-05-16T23:09:08Z

I noticed the API worked in Chrome, but not via the lib.

The no data problem seems to have gone away if I change the headers
MatthewFlamm/pynws#201

Maybe I'm just getting lucky though...

bdraco · 2024-05-16T23:10:06Z

Will report back tomorrow and see if it drops out overnight as its going to be very obvious if there is a difference

MatthewFlamm · 2024-05-17T00:32:50Z

The no data problem seems to have gone away if I change the headers
MatthewFlamm/pynws#201

This will likely be a temporary fix until the server starts messing up the cache for this new user agent again. This has happened in the past on another end point on this server. See weather-gov/api#492. They explicitly request developers not to use cache-busting techniques however (I'm not suggesting that you are proposing this).

My plan is to gather some key raw data and open a ticket.

I have confirmed in the past that our request interval would be nowhere near frequent enough to cause a problem (in regards to use terms, but practically I'm not sure?).

MatthewFlamm · 2024-05-17T20:09:02Z

One thought for feedback... We keep track of the last successful update times (including whether data is returned) in the entry.runtime_data. Then we only raise UpdateFailed in the coordinator call when this time is >X minutes ago (say 60 minutes for now). This would prevent the shorter drop outs and still be faithful to the data source. This essentially returns us to what NWS did prior to the changes in the coordinator logic that we discussed over discord. That is, treat data as valid for X minutes even over multiple failed updates.

MatthewFlamm · 2024-05-20T19:07:09Z

Most significant changes in 8b99d0e include using a custom DataUpdateCoordinator for the observations. This enables us to internally raise an UpdateFailed only after a certain amount of time elapses from the last successful update. In my limited testing, it is much more robust to outages in NWS.

In #117533, it was observed that getting no data in HA correlated with getting 2+ hr old data from a simpler curl fetch. We are filtering out old data in NWS integration by using the start_time parameter, which is causing no data to be returned.

I'm opening this for review now. We could consider wrapping the forecast coordinators into this change too. Pending feedback, I opted to not include this since it would make this PR even larger and there is already caching done in weather.get_forecasts logic. We need to avoid double dipping. This might become important for #117254 and would be more appropriate to add to that PR IMO.

bdraco · 2024-05-20T19:52:19Z

If there are no breaking changes in the library bump, please do that in a separate PR first. Thanks!

MatthewFlamm · 2024-05-20T20:19:11Z

If there are no breaking changes in the library bump, please do that in a separate PR first. Thanks!

#117820

MatthewFlamm · 2024-05-20T20:22:00Z

One unresolved pain point in this implementation is that if the first update fails, it will wait a full 10 minutes to make another update request. To keep the PR less unwieldy as it is already getting very lengthy, I would propose this for a future PR.

bdraco · 2024-05-20T20:44:56Z

I'll put this on my production system in a bit

bdraco · 2024-05-20T23:11:30Z

Looks like its better but still seems to be as long as 90 minutes before new data is returned (maybe longer)

MatthewFlamm · 2024-05-20T23:56:51Z

Looks like its better but still seems to be as long as 90 minutes before new data is returned (maybe longer)

We can't fix the server not returning valid data for 90 minutes without cache busting, which is against the terms of use. This PR brings extra robustness in the HA integration while still marking the observations as invalid when applicable. The current state in the dev branch will continue to show data > 1 hr old, and users will not be able to know this.

kamiyo · 2024-05-21T04:23:51Z

This might be a separate PR, but should we add a sensor for the data being stale?

MatthewFlamm · 2024-05-21T12:15:55Z

This might be a separate PR, but should we add a sensor for the data being stale?

The observation data will obviously be stale in this PR when the data is marked as "unavailable" in the entity. A sensor that holds the time of the observation sounds fine to me in a separate PR.

Unfortunately the forecast data model isn't the best in HA in regards to this and I'm planning on opening an issue about it.

homeassistant/components/nws/sensor.py

bdraco · 2024-05-22T07:04:43Z

It seems to work pretty well now as long as I don't restart. Once I restart I have to wait a bit before the data is valid again

bdraco · 2024-05-22T07:06:56Z

tests/components/nws/test_weather.py

-    with patch("homeassistant.components.nws.utcnow") as mock_utc:
+    with patch("homeassistant.components.nws.coordinator.utcnow") as mock_utc:


Ideally we use FrozenDateTimeFactory instead of patching utcnow

I was trying to keep PR semi-manageable. This was changed to accommodate the change in location, not a change in functionality. I agree that these tests needs to be updated in the future.

bump pynws and use retry_no_data

a1fe518

home-assistant bot added cla-signed dependency integration: nws new-feature small-pr PRs with less than 30 lines. by-code-owner Quality Scale: platinum labels May 8, 2024

MatthewFlamm changed the title ~~NWS: retry when no data is returned in update~~ Retry when no data is returned in update for nws May 8, 2024

MatthewFlamm marked this pull request as draft May 8, 2024 20:23

use UpdateFailed

a12eba5

MatthewFlamm marked this pull request as ready for review May 8, 2024 23:48

test coverage

9bd32a8

MatthewFlamm marked this pull request as draft May 12, 2024 00:09

MatthewFlamm mentioned this pull request May 13, 2024

Make forecast caching match observation caching MatthewFlamm/pynws#200

Merged

bump pynws to 1.8.1

da8bdd4

MatthewFlamm mentioned this pull request May 17, 2024

NWS integration fails to populate 11 sensor entities. #117533

Closed

MatthewFlamm added 2 commits May 20, 2024 14:55

Use custom DataUpdateCoordinator to make data valid for 60 minutes

8b99d0e

Merge branch 'dev' into nws-retry-no-data

ee0ec78

MatthewFlamm marked this pull request as ready for review May 20, 2024 19:07

MatthewFlamm changed the title ~~Retry when no data is returned in update for nws~~ Retry when no data is returned in update and keep observation data valid for 60 min for nws May 20, 2024

MatthewFlamm changed the title ~~Retry when no data is returned in update and keep observation data valid for 60 min for nws~~ Keep observation data valid for 60 min and retry with no data for nws May 20, 2024

MatthewFlamm mentioned this pull request May 20, 2024

Bump pynws to 1.8.1 for nws #117820

Merged

20 tasks

Merge branch 'dev' into nws-retry-no-data

44dcc24

bdraco reviewed May 22, 2024

View reviewed changes

homeassistant/components/nws/sensor.py Show resolved Hide resolved

bdraco reviewed May 22, 2024

View reviewed changes

bdraco approved these changes May 22, 2024

View reviewed changes

bdraco added the smash Indicator this PR is close to finish for merging or closing label May 22, 2024

bdraco merged commit 52bb02b into home-assistant:dev May 22, 2024
24 checks passed

MatthewFlamm mentioned this pull request May 22, 2024

Add forecast service call for extra attributes for nws #117254

Open

20 tasks

github-actions bot locked and limited conversation to collaborators May 23, 2024

MatthewFlamm deleted the nws-retry-no-data branch May 24, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep observation data valid for 60 min and retry with no data for nws #117109

Keep observation data valid for 60 min and retry with no data for nws #117109

MatthewFlamm commented May 8, 2024 •

edited

home-assistant bot commented May 8, 2024

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024 •

edited

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024

MatthewFlamm commented May 8, 2024

MatthewFlamm commented May 8, 2024

bdraco commented May 9, 2024

MatthewFlamm commented May 9, 2024 •

edited

bdraco commented May 11, 2024

MatthewFlamm commented May 12, 2024

MatthewFlamm commented May 13, 2024

bdraco commented May 16, 2024

bdraco commented May 16, 2024

MatthewFlamm commented May 17, 2024 •

edited

MatthewFlamm commented May 17, 2024

MatthewFlamm commented May 20, 2024

bdraco commented May 20, 2024

MatthewFlamm commented May 20, 2024

MatthewFlamm commented May 20, 2024

bdraco commented May 20, 2024

bdraco commented May 20, 2024

MatthewFlamm commented May 20, 2024

kamiyo commented May 21, 2024

MatthewFlamm commented May 21, 2024

bdraco commented May 22, 2024

bdraco May 22, 2024

MatthewFlamm May 22, 2024

		with patch("homeassistant.components.nws.utcnow") as mock_utc:
		with patch("homeassistant.components.nws.coordinator.utcnow") as mock_utc:

Keep observation data valid for 60 min and retry with no data for nws #117109

Keep observation data valid for 60 min and retry with no data for nws #117109

Conversation

MatthewFlamm commented May 8, 2024 • edited

Proposed change

Type of change

Additional information

Checklist

home-assistant bot commented May 8, 2024

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024 • edited

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024

bdraco commented May 8, 2024

MatthewFlamm commented May 8, 2024

MatthewFlamm commented May 8, 2024

MatthewFlamm commented May 8, 2024

bdraco commented May 9, 2024

MatthewFlamm commented May 9, 2024 • edited

bdraco commented May 11, 2024

MatthewFlamm commented May 12, 2024

MatthewFlamm commented May 13, 2024

bdraco commented May 16, 2024

bdraco commented May 16, 2024

MatthewFlamm commented May 17, 2024 • edited

MatthewFlamm commented May 17, 2024

MatthewFlamm commented May 20, 2024

bdraco commented May 20, 2024

MatthewFlamm commented May 20, 2024

MatthewFlamm commented May 20, 2024

bdraco commented May 20, 2024

bdraco commented May 20, 2024

MatthewFlamm commented May 20, 2024

kamiyo commented May 21, 2024

MatthewFlamm commented May 21, 2024

bdraco commented May 22, 2024

bdraco May 22, 2024

Choose a reason for hiding this comment

MatthewFlamm May 22, 2024

Choose a reason for hiding this comment

MatthewFlamm commented May 8, 2024 •

edited

MatthewFlamm commented May 8, 2024 •

edited

MatthewFlamm commented May 9, 2024 •

edited

MatthewFlamm commented May 17, 2024 •

edited