Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sonoff Zigbee 3.0 Dongle Randomly Died and ZHA made fixing it worse. #124

Closed
Daniel-dev22 opened this issue Jan 2, 2022 · 17 comments
Closed

Comments

@Daniel-dev22
Copy link

As the title suggests my sonoff zigbee 3.0 dongle randomly froze up. I restarted the dongle by plugging it back in and it was very sluggish then I started getting a ton of errors.

I did not have ZigBee debug logging on. Here's the home assistant logs from the UI.

Logger: homeassistant.components.zha.core.channels.base
Source: components/zha/core/channels/base.py:428
Integration: Zigbee Home Automation (documentation, issues)
First occurred: January 1, 2022, 7:44:26 PM (381 occurrences)
Last logged: January 1, 2022, 8:18:02 PM

[0x4B23:1:0x0300]: async_initialize: all attempts have failed: [DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.MAC_NO_ACK: 233>')]
[0x4B23:1:0x0008]: async_initialize: all attempts have failed: [DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.MAC_NO_ACK: 233>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>')]
[0x4B23:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>'), DeliveryError('Request failed after 5 attempts: <Status.NWK_NO_ROUTE: 205>')]
[0x23E0:1:0x0001]: async_initialize: all attempts have failed: [DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>')]
[0x23E0:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>'), DeliveryError('Request failed after 5 attempts: <Status.APS_NO_ACK: 183>')]
Logger: zigpy.zcl
Source: deps/lib/python3.9/site-packages/zigpy/util.py:293
First occurred: January 1, 2022, 7:47:20 PM (9 occurrences)
Last logged: January 1, 2022, 7:47:21 PM

[0xb812:1:0x0300] Traceback (most recent call last): File "/config/deps/lib/python3.9/site-packages/zigpy/device.py", line 265, in request result, msg = await self._application.request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request return await self._send_request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request response = await self._send_request_raw( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1369, in _send_request_raw self._znp.request_callback_rsp( AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'
[0xb8a5:1:0x0006] Traceback (most recent call last): File "/config/deps/lib/python3.9/site-packages/zigpy/device.py", line 265, in request result, msg = await self._application.request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request return await self._send_request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request response = await self._send_request_raw( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1369, in _send_request_raw self._znp.request_callback_rsp( AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'
[0xb8a5:1:0x0008] Traceback (most recent call last): File "/config/deps/lib/python3.9/site-packages/zigpy/device.py", line 265, in request result, msg = await self._application.request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request return await self._send_request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request response = await self._send_request_raw( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1369, in _send_request_raw self._znp.request_callback_rsp( AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'
[0xdd60:1:0x0300] Traceback (most recent call last): File "/config/deps/lib/python3.9/site-packages/zigpy/device.py", line 265, in request result, msg = await self._application.request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request return await self._send_request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request response = await self._send_request_raw( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1369, in _send_request_raw self._znp.request_callback_rsp( AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'
[0x7127:1:0x0300] Traceback (most recent call last): File "/config/deps/lib/python3.9/site-packages/zigpy/device.py", line 265, in request result, msg = await self._application.request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request return await self._send_request( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request response = await self._send_request_raw( File "/config/deps/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1369, in _send_request_raw self._znp.request_callback_rsp( AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'

ZHA or home assistant core made this worse because I have multiple dongles and I switched to another and changed the config entry. I have done this multiple times to troubleshoot except this time all 106 devices were "disabled by configuration entry" and there was no way to enable them even if there was a full reboot. The only way to enable it was to press on the entity go to device settings and hit disable then re-enable and update then it gave me an enable button and I enabled it.. I had to do that 106 times and that was awful. If this is a zha issue can we get that removed? Why does it matter if I change the config entry that doesn't mean make my life harder after my sonoff stick died lol.

I can't re-enable the coordinator at all as there's no entity to click on hopefully that doesn't cause issues.
One of them switched after doing disable and re-enable at the same time and hitting update to saying "disabled by user" even though it was enabled the entire time I never updated it with "disabled" I just clicked it twice but it never updated as disabled. Another kept the same message just now of "disabled by configuration entry". Needless to say not sure what happened to my sonoff dongle thankfully I have 4. But zha just made everything 10x more difficult.

Screenshot_20220102-050637_Home Assistant
Screenshot_20220102-050426_Home Assistant
Screenshot_20220102-050412_Home Assistant
Screenshot_20220101-194600_Home Assistant

@puddly
Copy link
Collaborator

puddly commented Jan 2, 2022

ZHA or home assistant core made this worse because I have multiple dongles and I switched to another and changed the config entry

Did you do a network backup/restore? ZHA doesn't keep track of network settings so switching hardware will leave you with a new network with random settings.

@Daniel-dev22
Copy link
Author

Daniel-dev22 commented Jan 2, 2022

Yes I did as I have multiple dongles. The issue is why did zha or home assistant disable all entities just because of a config entry change which is necessary to change the USB Id. Disabling all the entities makes it very time consuming if one has a lot of devices to re-enable. Whether that's because of a dongle isuse or because of a migration.

@puddly
Copy link
Collaborator

puddly commented Jan 2, 2022

The dongle lockup issue is something I can possibly take a look at in zigpy-znp. Can you make NVRAM backups of both a working and a broken dongle and email them to me/attach them here?

except this time all 106 devices were "disabled by configuration entry" and there was no way to enable them even if there was a full reboot. The only way to enable it was to press on the entity go to device settings and hit disable then re-enable and update then it gave me an enable button and I enabled it.. I had to do that 106 times and that was awful.

@Adminiuga @dmulcahey This is the second report of all entities being disabled by HA Core (#111) after disabling/re-enabling ZHA or editing a path in core.config_entries. Did something change in a recent update?

@Daniel-dev22
Copy link
Author

I will email the 2 Nvram backups.

Is there a reason I'm unable to backup the sonoff dongle with

pip install zigpy-znp 'git+https://github.com/puddly/bellows@puddly/open-coordinator-backup'

Screenshot_20220102-135956_Termius

Yes please undo whatever occured in an ha core update. If a configuration change occurred I don't see why it's important that entities are disabled at all.

@puddly
Copy link
Collaborator

puddly commented Jan 2, 2022

bellows is the wrong command. Take a look at the documentation: https://github.com/zigpy/zigpy-znp/blob/dev/TOOLS.md#backup-and-restore

@Daniel-dev22
Copy link
Author

I see I was using the command I used on the nortek.

@Daniel-dev22
Copy link
Author

Any update on what happened to the problem stick?

@puddly
Copy link
Collaborator

puddly commented Jan 11, 2022

I see no differences in the NVRAM between the two. Both produce nearly identical network backups when written to my stick, with the only two differences being:

  1. The network key frame counter is smaller in the "working" backup (expected, since you restored it from an old backup).
  2. The TCLK being different (expected, since they seemed to have been produced from a backup of an EZSP radio).

Everything else is structured normally.

To double check, I restored both NVRAM backups one by one to my ZZH! and tested sending commands. Wireshark shows that both NVRAM backups produce the "same" network from the perspective of a listening device.

Can you "swap" the NVRAM between the two? Do this offline with the current devices, the backup you took earlier may be too old:

$ WORKING_STICK=/dev/serial/by-id/...
$ BROKEN_STICK=/dev/serial/by-id/...

$ python -m zigpy_znp.tools.nvram_read $WORKING_STICK -o nvram.working.json
$ python -m zigpy_znp.tools.nvram_read $BROKEN_STICK -o nvram.broken.json

$ python -m zigpy_znp.tools.nvram_reset $WORKING_STICK
$ python -m zigpy_znp.tools.nvram_reset $BROKEN_STICK

$ python -m zigpy_znp.tools.nvram_write $WORKING_STICK -i nvram.broken.json
$ python -m zigpy_znp.tools.nvram_write $BROKEN_STICK -i nvram.working.json

After adjusting the serial port, the "broken" stick should start working. If not, it is a hardware problem.

@Daniel-dev22
Copy link
Author

I will look into swapping Nvrams. Any word on the disabled by configuration entry change made by home assistant core?

I just noticed it on my Roomba integration I disabled. At least that was 3 entities not 100+ to re-enable.

@puddly
Copy link
Collaborator

puddly commented Jan 13, 2022

Fixed by home-assistant/core#63944, seems like it will be released in HA Core 2021.12.10.

@Daniel-dev22
Copy link
Author

Daniel-dev22 commented Jan 13, 2022

Sounds good because enabling all the devices was awful especially when it's possible the sonoff stick has a problem. It can take a quick swap from quick to a super long time.

@Daniel-dev22
Copy link
Author

This is slightly unrelated but related to what happens when the coordinator dies. I inquired about whether it's possible for a failover coordinator and le_top who made a component for automatic backups said he found mention of it in a 2008 ZigBee paper.

Is this something that's possible with zha? It would be great to have a coordinator ready to take over immediately when one dies.

https://community.home-assistant.io/t/daily-znp-zigbee-coordinator-backup-blueprint/377149/17?u=danny2100

@puddly
Copy link
Collaborator

puddly commented Jan 14, 2022

I'm still not sure what "coordinator dies" means in your case. Did the dongle itself have a hardware glitch? Did some core router of your network have a firmware glitch and "lock up" but still pretend to be online (IKEA and other routers can do this until they're rebooted)? Was it a host hardware issue?

@Daniel-dev22
Copy link
Author

Daniel-dev22 commented Jan 14, 2022

I didn't try to use the other coordinator yet so I don't know if it works still. That's why I said it was slightly unrelated to what happened to me but it would be great if it ever happened again for another coordinator to immediately take over.

I was inquiring as to whether it's possible if the coordinator locks up for example and stops sending commands another coordinator can take over. My coordinator per the logs I attached did have an issue what the cause was I don't know but it was unresponsive to home assistant. I'm wondering if it's possible that can be detected and another coordinator could immediately take over.

If you see the link I attached le_top linked a few articles discussing this very possibility. That a coordinator could immediately take over if it detected the main coordinator was down for example.

https://community.home-assistant.io/t/daily-znp-zigbee-coordinator-backup-blueprint/377149/17?u=danny2100

@puddly
Copy link
Collaborator

puddly commented Jan 14, 2022

Without figuring out the root cause, there's no point in trying to fix an unknown problem with a complicated solution. The coordinator's firmware may have very well been running even if the stick's serial interface is down, so the coordinator appeared "alive" on the network while appearing "dead" to the host, preventing any sort of backup coordinator from ever detecting a problem.

It'd be a whole lot easier to use https://github.com/mvp/uhubctl and have two coordinators plugged into a supported USB hub. Power cycle the hub to physically reset the sticks, restore a backup onto the second coordinator, and change the serial port path to point to it. Restart HA and you're good to go.

Heck, since zigpy-znp auto-reconnects in case the radio becomes unresponsive, you could even do a hot swap without restarting HA if you create some /dev/zigbee symlink and update it to point to a specific serial device externally.

@Daniel-dev22
Copy link
Author

How would a hotswap be done? Is there a potential way to automate when zigpy-znp autoreconnects it would hot swap to another coordinator?

@puddly
Copy link
Collaborator

puddly commented Jan 16, 2022

Create a symlink and externally update it to point to the correct radio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants