Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] neighbors not appearing on boot, disappearing over time #2074

Closed
4 of 7 tasks
kmwoley opened this issue Dec 17, 2021 · 15 comments
Closed
4 of 7 tasks

[bug] neighbors not appearing on boot, disappearing over time #2074

kmwoley opened this issue Dec 17, 2021 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@kmwoley
Copy link

kmwoley commented Dec 17, 2021

Version

Checklist:

  • I am not using Home Assistant. Or: a developer has told me to come here.
  • I have checked the troubleshooting section and my problem is not described there.
  • I have read the changelog and my problem is not mentioned there.

Build/Run method

  • Docker
  • PKG
  • Snap package
  • Manually built (git clone - yarn install - yarn run build )

Zwavejs2Mqtt version: 6.1.1
Z-Wave JS version: 8.9.0-beta.1

Describe the bug

Immediately after rebooting none of my zwave devices have neighbors. Including the controller. Sometimes, if I manually navigate to the network graph, a refresh of the neighbors will be triggered and then the neighbors will show up. But if I start Zwave2Mqtt and do not navigate to the network graph UI the neighbors are never populated. When I am able to get the neighbors to populate, they eventually disappear over time and outbound messages to my zwave devices stop working.

To Reproduce

Steps to reproduce the behavior:

  1. Restart the Docker container. Observe that the controller's neighbors, and all other device neighbors, are missing.
  2. Navigate to Network Graph UI. Observe that (sometimes) the neighbors are refreshed and populate. Sometimes they are not.
  3. Wait 1-3 days.
  4. Observe that neighbors disappear from the controller, and outbound messages to zwave devices stop working.

Expected behavior

Neighbors are available after reboot and do not disappear over time. Won't you be my neighbor?

Additional context

The result of this is that commands to my zwave devices eventually stop working. They typically work right after the Docker container has started, but somewhere between a few hours to a few days outbound commands to devices stop working.

Inbound data from zwave devices continue to work. I see updates from all of the devices.

Also, I've found that even if I am able to get the neighbors to populate, they do eventually disappear over time. So, even if I boot the server, go immediately into the UI, refresh the neighbors by going to the Network Graph, eventually a few hours/days later the neighbors disappear again.

Sometimes going to the Network Graph UI will trigger a neighbor refresh, but the neighbors do not update. This results in a "flat" network graph with no connections and no update to the node data.

Debugging steps I've attempted

  1. Rebooting the server that host the Docker containers
  2. Removing and re-inserting the zwave stick (Aeotec Gen5)
  3. Backing up and re-flashing the zwave stick
  4. Healing the zwave network

Log files

Below are the logs / node info from a reboot last night where I did not go to the control panel for 6+ hours after restart.
At the end of the logs I tried to go to the network graph which triggered an unsuccessful neighbor refresh.

zwavejs2mqtt kmwoley.zip

@kmwoley kmwoley added the bug Something isn't working label Dec 17, 2021
@kmwoley
Copy link
Author

kmwoley commented Dec 18, 2021

Update. After running for several hours, messages stopped being delivered to my zwave devices. It happened overnight around 10pm.

For example, triggering a binary switch from ZwaveJS2MQTT shows this in the ZwaveJS2MQTT log. Note that it never gets a return for the writeValue call. This is the last thing I see.

2021-12-18 06:01:07.714 DEBUG SOCKET: Event ZWAVE_API emitted to DNYdjCB8FT29jxMQAAAP
2021-12-18 06:01:07.715 INFO ZWAVE: Calling api writeValue with args: [
  {
    nodeId: 44,
    commandClass: 37,
    endpoint: 0,
    property: 'targetValue'
  },
  false,
  {},
  [length]: 3
]
2021-12-18 06:01:07.715 INFO ZWAVE: Writing false to 44-37-0-targetValue

However, there is nothing logged in the ZwaveJS logs corresponding to that action. It's like ZwaveJS never is called.

I dumped the node_dump.json - it looks like everyone has neighbors, but I noticed the controller has fewer neighbors now than in the dump I provided earlier.
nodes_dump - 6am before reboot.zip

After rebooting - I confirmed that the controller has no neighbors, navigated to the Network Graph to trigger the neighbor refresh, and then confirmed that the controller now shows neighbors.

I then confirmed that I can trigger the binary switch update.

ZWaveJS2MQTT Log:

2021-12-18 06:36:42.832 DEBUG SOCKET: Event ZWAVE_API emitted to Bd4egvrA7486cDL5AAAD
2021-12-18 06:36:42.833 INFO ZWAVE: Calling api writeValue with args: [
  {
    nodeId: 44,
    commandClass: 37,
    endpoint: 0,
    property: 'targetValue'
  },
  false,
  {},
  [length]: 3
]
2021-12-18 06:36:42.834 INFO ZWAVE: Writing false to 44-37-0-targetValue
2021-12-18 06:36:42.915 INFO ZWAVE: Node 44: value updated: 37-0-currentValue true => false
2021-12-18 06:36:42.917 INFO ZWAVE: Node 44: value updated: 37-0-targetValue true => false
2021-12-18 06:36:42.917 INFO ZWAVE: Success zwave api call writeValue true
2021-12-18 06:36:44.050 INFO ZWAVE: Node 44: value updated: 37-0-currentValue false => false

And the zwave log:

2021-12-18T14:36:42.857Z SERIAL » 0x010a00132c03250100251ad2                                          (12 bytes)
2021-12-18T14:36:42.858Z DRIVER » [Node 044] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      26
                                  └─[BinarySwitchCCSet]
                                      target value: false
2021-12-18T14:36:42.860Z SERIAL « [ACK]                                                                   (0x06)
2021-12-18T14:36:42.866Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2021-12-18T14:36:42.867Z SERIAL » [ACK]                                                                   (0x06)
2021-12-18T14:36:42.868Z DRIVER « [RES] [SendData]
                                    was sent: true
2021-12-18T14:36:42.908Z SERIAL « 0x010700131a000005f4                                                 (9 bytes)
2021-12-18T14:36:42.909Z SERIAL » [ACK]                                                                   (0x06)
2021-12-18T14:36:42.910Z DRIVER « [REQ] [SendData]
                                    callback id:     26
                                    transmit status: OK
2021-12-18T14:36:42.914Z CNTRLR   [Node 044] [~] [Binary Switch] currentValue: true => false        [Endpoint 0]
2021-12-18T14:36:43.923Z SERIAL » 0x010900132c022502251bd2                                            (11 bytes)
2021-12-18T14:36:43.924Z DRIVER » [Node 044] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      27
                                  └─[BinarySwitchCCGet]
2021-12-18T14:36:43.926Z SERIAL « [ACK]                                                                   (0x06)
2021-12-18T14:36:43.932Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2021-12-18T14:36:43.933Z SERIAL » [ACK]                                                                   (0x06)
2021-12-18T14:36:43.933Z DRIVER « [RES] [SendData]
                                    was sent: true
2021-12-18T14:36:43.974Z SERIAL « 0x010700131b000005f5                                                 (9 bytes)
2021-12-18T14:36:43.975Z SERIAL » [ACK]                                                                   (0x06)
2021-12-18T14:36:43.975Z DRIVER « [REQ] [SendData]
                                    callback id:     27
                                    transmit status: OK
2021-12-18T14:36:44.047Z SERIAL « 0x01090004002c03250300fb                                            (11 bytes)
2021-12-18T14:36:44.049Z CNTRLR   [Node 044] [~] [Binary Switch] currentValue: false => false       [Endpoint 0]
2021-12-18T14:36:44.051Z SERIAL » [ACK]                                                                   (0x06)
2021-12-18T14:36:44.052Z DRIVER « [Node 044] [REQ] [ApplicationCommand]
                                  └─[BinarySwitchCCReport]
                                      current value: false

The next thing I'm trying is to remove the most recent zwave device I added to my network to see if that changes anything.

@mphemma
Copy link

mphemma commented Dec 19, 2021

I have the exact behavior to. After a couple of days my network graph will look like this:
Zwavejs2Mqtt no neighbors
To get it to work again I need to restart the docker container or just go to ZWave Settings and press save. After aprox. one minute the network is up and running again.

@doug62
Copy link

doug62 commented Dec 20, 2021

I have EXACT same issues, I have Aeotec Gen 5, this started occurring around the time the below update might have occurred, other notes:

  1. I can most often see neighbors but at some point they go away.
  2. Works ok, perhaps slow after a server reboot - slowly degrades and looses touch w devices.
  3. I node will report a change to HA and HA will update it in the UI, However - HA can't control a device.
  4. Direct device updates from Z-wave JS DON'T update devices
  5. Devices don't respond to PINGS
  6. Yet devices report status to HA/UI/Z-Wave JS properly.

ll/Most z-wave functionality
zwavejs2mqtt: 6.1.1
zwave-js: 8.9.0-beta.1
home id: 3741021741
home hex: 0xdefb762d

@robertsLando
Copy link
Member

Does reverting to a previous version fixes the problem? Try with 6.0.3

@mphemma
Copy link

mphemma commented Dec 20, 2021

I reverted to 6.0.1 with same problem. Did that test in the end of November.

@robertsLando
Copy link
Member

no clue, I tried to check the nodes dump and I cannot find any issue, @AlCalzone ?

@AlCalzone
Copy link
Member

AlCalzone commented Dec 20, 2021

Might be worth a test with the latest beta.3
I fixed several communication issues there.

Neighbor information is reported by the stick though, so that problem might be hardware or firmware related (700 series sticks). See zwave-js/node-zwave-js#3906

@kmwoley
Copy link
Author

kmwoley commented Dec 20, 2021

One thing I've noticed is that I see a DRIVER » [REQ] [SendDataAbort] message around the time communication to the zwave devices fails. I ran two experiments where I repeatedly sent an on/off every 5 min to a binary switch and tested to see if it was received. Both times I saw that the switch stopped reacting to the commands after SendDataAbort.

Logs are attached for both of those experiments.

Experiment 1
Subject: Node 059
Failed after 2021-12-18T18:50 in the zwave log (10:50 in the zwavejs2mqqt)
2021-12-18T18:51:02.125Z DRIVER » [REQ] [SendDataAbort]
zwavejs2mqtt_2021-12-18-1235.log
zwavejs_2021-12-18-1235.log

Experiment 2
Subject: Node 059
Failed after 2021-12-19T03:01 in the zwave log (19:01 in the zwavejs2mqqt)
2021-12-19T03:08:12.528Z DRIVER » [REQ] [SendDataAbort]
zwavejs_2021-12-18-2124.log
zwavejs2mqtt_2021-12-18--2124.log

@AlCalzone - I'd be happy to try the latest beta.3. I see these instructions for installing from github. I'm running the zwavejs2mqqt docker container. Can you point me to instructions on how to get beta.3 installed into the docker container?

I'm on an Aeotec Gen5 stick - I don't think the 700 series issues apply.

@doug62
Copy link

doug62 commented Dec 21, 2021

Is this related to #2081 ?

@AlCalzone
Copy link
Member

Can you point me to instructions on how to get beta.3 installed into the docker container?

@robertsLando ?

@robertsLando
Copy link
Member

Use master tag instead of latest

@kmwoley
Copy link
Author

kmwoley commented Dec 21, 2021 via email

@robertsLando
Copy link
Member

Closing for now, if the issue persists feel free to reopen

@kmwoley
Copy link
Author

kmwoley commented Dec 23, 2021

@AlCalzone - it does not look like beta.3 has resolved this issue. It may have gotten better (I'm seeing longer times between failure) but the issues described in this bug still persist.

Logs attached. These logs show about 7 hours between the first boot of docker with beta.3 and 2021-12-21T19:01:41.659Z DRIVER » [REQ] [SendDataAbort], after which commands to the devices stopped responding.
zwavejs_2021-12-21.log.gz

I rebooted and have been running now for 36 hours without issue. I didn't change anything besides rebooting. I'll keep monitoring for failure, but given that I've seen it fail once already makes me think it's not truly fixed.

@robertsLando - I doesn't look like I have permissions to reopen this issue. Can you please do so? Or should this bug be filed on the node-zwave-js project?

edit: I just rebooted to update to the latest :master to pick up Zwavejs2Mqtt 6.2.0 and Z-Wave JS 8.9.1. Going to continue to let it run and see if it fails again.

@robertsLando robertsLando reopened this Dec 27, 2021
@kmwoley
Copy link
Author

kmwoley commented Dec 30, 2021

Hey folks - checking back in here. I ran for a full week without issues. Thanks for your efforts on all parts of this stack - your efforts are greatly appreciated @robertsLando & @AlCalzone !

@kmwoley kmwoley closed this as completed Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants