Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 #3906

Closed
AlCalzone opened this issue Dec 15, 2021 · 467 comments · Fixed by #4515
Closed

🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 #3906

AlCalzone opened this issue Dec 15, 2021 · 467 comments · Fixed by #4515

Comments

@AlCalzone
Copy link
Member

AlCalzone commented Dec 15, 2021

It seems that 700-series sticks (including the currently latest firmware 7.17) have some problems which mostly appear on networks that:

  • are relatively busy (lots of unsolicited reports like motion sensor, power meters)
  • are large and/or
  • contain some battery-powered devices

When lots of reports reach the controller in a short time, the 700-series sticks are not able to send any message.
It looks like the stick is somehow blocked and simply doesn’t send anything, maybe not even the protocol level acknowledgements for receiving the messages, causing end nodes to repeat their messages over and over, making the situation even worse.

👷🏻‍♂️ EDIT: Fix available, see below for direct links to the updated firmware 7.17.2

🔥 Bug in NVM conversion routine, potentially causing connectivity issues. Details see below

🗳 If you've updated, please take part in the survey so we can see if the update helps.


We believe the following symptoms are all caused by this:

  • Huge delays (up to 60 seconds) when sending messages
  • Failure to send anything (TransmitStatus: Fail)
  • Floods of incoming reports that are transmitted over and over
  • Incorrect neighbor information (e.g. controller doesn't list some/any nodes as neighbors, etc.)
  • Failure to heal the network or individual nodes, especially in busy situations

Additional background info:
https://forums.homeseer.com/forum/homeseer-products-services/homeseer-z-wave-products/smartstick/1483440-does-anyone-have-a-solid-working-g3-system-at-this-time/page4#post1510687


A workaround until this is fixed is migrating back to a 500 series stick, using the migration tool 700<->500 series. Description:
#3906 (comment)

@AlCalzone AlCalzone added the bug Something isn't working label Dec 15, 2021
@zwave-js-bot zwave-js-bot added this to Needs triage in Triage Dec 15, 2021
@AlCalzone AlCalzone pinned this issue Dec 15, 2021
@AlCalzone AlCalzone removed this from Needs triage in Triage Dec 15, 2021
@AlCalzone AlCalzone removed the bug Something isn't working label Dec 15, 2021
@AlCalzone AlCalzone changed the title META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 Dec 15, 2021
@darkbasic

This comment was marked as outdated.

@justindthomas

This comment was marked as outdated.

@AlCalzone
Copy link
Member Author

AlCalzone commented Dec 19, 2021

Edit: zwavejs2mqtt 6.3.0 has built-in support for this now.
Just restore a backup of the source stick onto the target stick and you're good to go.


If anyone wants to take a little risk and try the migration back to the 500 series (requires Node.js and npm to be installed):

  1. make an NVM backup of the current 700 series stick
  2. make an NVM backup of the target 500 series stick
  3. execute the convert command here: https://github.com/zwave-js/node-zwave-js/tree/master/packages/nvmedit#convert-one-nvm-to-be-compatible-with-another-one
  4. Restore the resulting NVM file on the target stick

❗❗❗ Disclaimer: I have quite a few unit tests ensuring the correct format but I haven't actually tested restoring the resulting files on a stick, so I'm not 100% certain if it will work (both the backup and the stick 😬). If it doesn't work, you should be able to hard-reset the target stick to get it working again, but I'm not making guarantees here. Try at your own risk!

@justindthomas

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@dearekaelle

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@justindthomas

This comment was marked as off-topic.

@AlCalzone

This comment was marked as off-topic.

@dearekaelle

This comment was marked as off-topic.

@darkbasic
Copy link
Contributor

darkbasic commented May 30, 2022

@peng1can from my experience the controller's fw randomly picks routes without accounting for things like SNR and the result is an unreliable mess for everything but direct range devices with good signal to noise ratio. I suggest you to play with zwavejs2mqtt's Health Check, you will probably end up with results similar to mine: #4184 (comment)

@peng1can
Copy link
Contributor

peng1can commented May 30, 2022

Thanks for the suggestions. My updated (7.17.2) Aetoec z-stick 7 is on a 6' USB 2 extension cable, and I've tried simplifying it to the point that the only devices I'm pairing to it after completely wiping ZwaveJS are Zooz ZSE41's and/or Ring Contact Sensors (both 700-series chips) located in the same room. Both types of devices pair unpredictably - I've tried smart start and DSK pairing, sometimes the devices pair without s2auth, sometimes they don't fully interview; sometimes a re-interview helps, sometimes it causes duplicate devices; sometimes they report open/close status, but more often they report open and then hang for several cycles. I had tried a few mains-powered devices that seemed to pair more reliably, but still sometimes pair without s2 auth for no obvious reason. At this point I'm thinking I just need to send the 700-series stuff back before it's too late to do so.

@mundschenk-at
Copy link
Contributor

Do you have 500 series stick for comparison, @peng1can? I've experienced very poor range with the Aeotec Z-Stick 7 (and my support ticket with them is going into the 3rd month now despite demonstrating the fault with a direct comparison - apparently they are not yet fluent with Z-Wave JS' logfile format).

@RyanWor
Copy link

RyanWor commented May 31, 2022

What @peng1can describes is almost my identical experience however I am using a ZST10-700 rather than Aeotec Gen7. And to @mundschenk-at's point I've also tried including/excluding my entire network to a 500 series stick (an Aeotec Gen5+) as well as I've tried another brand new ZST10-700, and just to see, I tried other hubs like Hubitat (will not include/exclude my Zooz 700 series devices at all, even right next to each other). I am starting to worry the issue is not just with the 700 controller but also with 700 devices (which is incredibly unfortunate if so because I have almost 40 hardwired Zooz 700 devices already installed in my walls and another 30 or so battery devices all of which are essentially useless to me for the past 3 months I have been fighting this issue. The cost and effort of replacing all of those, I don't even want to think about.

I have noted that when using the ZWave PC Controller software within Simplicity Studio things are quite a bit better, however attempts to build a network there and try and port that over to ZWaveJS are pretty much useless. ZWaveJS will go on a interview frenzy when you move the stick over and it will bring the network to it's knees. But what you describe sounds very very familiar to me, here are the main issues I've observed trying to set my medium sized (at least per Zooz Support definition @ ~70ish devices) since the first week of March:

-- Devices further away from hub have significant issues pairing with hub securely. This is despite the network showing a robust mesh of 15-16 powered devices from one end of the floor the controller is on (3rd) to the other. As soon as I start paring things on another floor, things go to hell.

-- **Devices further away from the hub may not even pair insecurely at times. ** Less common than the above but I have multiple Zooz 700 powered devices that will just outright fail inclusion. Even after manual excludes and factory resets. There are two different ways in which this occurs: one where I am prompted for the DSK PIN, enter it, but the device still pairs only as insecure; or other times I never get the DSK prompt at all and the device goes straight to insecure pairing.

-- Incomplete interviews Like the above, this problem is much more noticeable the further away from the hub I get, despite a supposed solid mesh (all devices can talk to one another per ZWave PC Controller topology map). This will result in either a fully unusable device, a device that supports only basic on and off commands, a device that has Unknown manufacturer 0xXXXX / 0xXXXX / Unknown product 0xXXXX listed as it's Manufacturer / Product / Product Code, a device that shows battery power with a question mark instead of mains power, a device that reports it does not support Beaming, a device that reports it does not support Z-Wave+, or some combination of the above symptoms. Re-interviews rarely if ever seem to help.

-- ** Zooz 700 series powered devices (ZEN77/ZEN71) will not fully factory reset** I have a number of switches that at times when running through the factory reset process, will stay on red, presumably forever, rather than go back to green and return to normal reset state. The switch becomes full inoperable when this happens. Only way to recover it is to remove power from the switch, either via the pull tab or breaker. Zooz Support tells me this happens when the device gets flooded with ZWave traffic and hangs up during the reset.

Things I've seen in logs:

-- Lots of driver/controller timeout errors in zwavejs
-- Lots of S2 nonce errors in zwavejs
-- Lots of SPAN failure errors in zwavejs
-- Lots of CRC errors in my zniffer output, mostly sporadic, which I understand is likely normal if the zniffer is far away from the sending device, and other times massive floods of CRC errors, like entire screens worth in zniffer output

I've reset, included, excluded, tried different controllers, different hubs, different placement, different power cables, different USB extensions, different hosts, different types of haos and zwavejs1mqtt deployments. I've tried pulling the pull tabs on every switch in attempt to silence all but the necessary ZWave traffic for paired or pairing devices, I've tried turning off nearly every breaker in my home and going one by one. I've done the pairing, exclusion, factory reset, processes on 70 devices 4-5 times now, had phone calls with Zooz techs, spent countless hours, walked who knows how many steps up and down three flights of stairs, and driven my girlfriend plain mad with downtimes and power outages.

I've got lots of logs from zwavejs2mqtt, simplicity studio, and zniffer. I am happy to share whatever with whomever if they are interested in digging into it further. I am just at my wits end though. I've never had issues like this with ZWave in nearly 9 years and I am nearly at the point of just giving up and considering this $2500+ worth of Zooz equipment and three months of my time and energy a total loss at this point. Really don't know what else to do here anymore.

For context, the equipment I have been trying setup, many of the battery powered devices have yet to even been installed as I can't get even get that far:

1x Zooz ZST10-700 Controller
3x Zooz ZEN32 Scene Controllers
1x Zooz ZEN34 Remote Switch
6x Zooz ZEN71 On/Off Switches
1x Zooz ZEN73 On/Off Toggle Switch
4x Zooz ZEN76 On/Off Switches
30x Zooz ZEN77 Dimmer Switches
16x Zooz ZSE40 4-in-1 Motion Sensors
5x Zooz ZSE41 Open/Close Sensors
5x Zooz ZSE42 Water Leak Sensor
4x Zooz ZSE43 Shock/Tilt Sensors
1z Zooz ZAC36 Water Valve Actuator
4x Kwikset 620 Smart Locks
1x Ecolink 700 series siren/speaker
18x Aeotec Gen7 Recessed Door Sensors
~100 total nodes (77 of which are Zooz 700 series devices)

@AlCalzone
Copy link
Member Author

AlCalzone commented May 31, 2022

am happy to share whatever with whomever if they are interested in digging into it further

I'll gladly take a look, but please open a separate issue for this, or ideally one issue per problem with targeted logs.

Also FYI, some Zooz devices spam the network for several seconds in response to two specific commands. These commands are each sent for each config parameter (there are quite a few) of each device (far in the double digits). We reported this to Zooz in the past but so far they don't seem to care.

EDIT: This seems to have slipped through the cracks. I have confirmation they are working on it now.

#4585 (comment) for some context. We'll soon enable the workaround for the 72,76,77, so basically the largest part of your network.

@RyanWor
Copy link

RyanWor commented May 31, 2022

am happy to share whatever with whomever if they are interested in digging into it further

I'll gladly take a look, but please open a separate issue for this, or ideally one issue per problem with targeted logs.

Also FYI, some Zooz devices spam the network for several seconds in response to two specific commands. These commands are each sent for each config parameter (there are quite a few) of each device (far in the double digits). We reported this to Zooz in the past but so far they don't seem to care.

#4585 (comment) for some context. We'll soon enable the workaround for the 72,76,77, so basically the largest part of your network.

Sounds good. Thank you. I am actually having pretty decent luck with my network today after it largely sitting doing nothing over the last 2 days. Maybe time just made things start healing themselves. Out of all my ZEN77/ZEN71 devices it seems I am down to a single ZEN71 that is not working as expected and based on reading thru #4585 and some other ones I see related it seems like that might be the same issue I am seeing. For what it's worth my zwavejs2mqtt is running in docker on a Linux laptop I've been carrying around my house. I've found I have FAR greater chances of an interview succeeding if the controller is right next to the device being re-interviewed. I went ahead and created a new issue #4669 and attached logs. Thanks for your help!

@muddro1
Copy link

muddro1 commented Jun 5, 2022

-- ** Zooz 700 series powered devices (ZEN77/ZEN71) will not fully factory reset** I have a number of switches that at times when running through the factory reset process, will stay on red, presumably forever, rather than go back to green and return to normal reset state. The switch becomes full inoperable when this happens. Only way to recover it is to remove power from the switch, either via the pull tab or breaker. Zooz Support tells me this happens when the device gets flooded with ZWave traffic and hangs up during the reset.

Had this happen to me plenty with zooz77 in attempts to get it working. I think for me started with their 10.10 firmware. Anyway, I learned that for factory reset to work, you need to exclude the node, otherwise it won't work and will just hang with the red light. Partially defeats the whole purpose of factory reset. I've reported this to zooz, and reproduce the error every single time on multiple switches but they don't seem to care.

@Avd888
Copy link

Avd888 commented Jun 9, 2022

@AlCalzone Unfortunately I'm still experiencing issues when moving from the z-stick gen 5(non plus) to the z-stick 7 using the NVM backup, here are my steps so far:

  • Upgraded the firmware of the gen 5 stick to 1.2
  • upgrade the firmware of the gen 7 stick to 7.17.2
  • created a back-up with the gen 5 witch zwavejs TO MQTT (6.11.0)
  • converted the nvm backup according the instructions (with the recent, yesterday, version)
  • restored the backup to the gen 7 via zwave js
  • this resulted in many dead nodes (only 5 of ~25 where alive), healing the network or interview did not help
  • in de 2nd try I restored the gen 5 bin file directly, without converting, directly to the gen 7, same issue. As I understood the action for converting in the background is the same.
  • check the transmit settings via the pc controller tool. What I noticed is that the transmit level is correct but the RF region is empty:
    2022-06-09 09_04_59-COM9 - Z-Wave PC Controller
  • Next step was to convert the nvm backup to a json file according the instruction, however this fails with the following message:
    2022-06-09 09_13_43-Windows PowerShell ISE
  • I changed the region via the pc controller software but the problem persists, I'm not sure if this actually writes is to the stick.
  • unplugging the stick etc. doesn't make a difference
  • After it didn't work with the gen 7 stick I plugged the gen 5 stick back in and de issues where instantly resolved and all nodes are alive.
    So there still seems to be an issue somewhere. please let me know if some specific logs are needed to

Overview of devices:
9x FGR223
1x Qubino ZMNHVD Flush dimmer 0-10V
3x Aeotec Smart Switch 6 ZW096
2x Neo NAS-WR01Z wallplug
1x Aeotec ZWA009 temp sensor (battey)
2x Neo NAS DA01Z (battery)
2x Fibaro FGS224
2x Fibaro FGS223
1x Heltun HE-RS01
1x Qubion ZMNHCD

EDIT: I managed to convert the NVM to Json with powershell core instead of the .net powershell with npx. The RFregion was set to 255 in the JSON file, so this appears to be a small bug in the convert. I will check if this resolves my issue in the evening.

EDIT2: The issues persist after setting the region manually via the json convert, many devices still show as dead. Running the check rfregion now shows the correct region

For the nodes which are still alive I see massive delays, I also see quite some of these errors for the s2 secured devices:
2022-06-09 20_04_20-ZWave To MQTT

@imsdigital
Copy link

imsdigital commented Jun 9, 2022 via email

@kdober
Copy link

kdober commented Jul 2, 2022

Hi.

Is this issue considered to be fixed?
I'm in the latest firmware for Aeotec stick 7 and this is happening for some nodes in my network.
Making a ping put them back immediately, but automations do not work when node is unavailable and neither the dashboards (have to manually ping the to re-enable them)

@AlCalzone
Copy link
Member Author

Yes, the main issue here is considered fixed. Please open a separate issue with driver logs if you want your problem investigated.

@AlCalzone AlCalzone unpinned this issue Aug 10, 2022
@macromarkman
Copy link

macromarkman commented Sep 12, 2022

@AlCalzone v7.18.1 is out now and we're testing it... have you tried it? Seems to be performing much better than both 7.17 variants on my HomeSeer system. Just curious to know if HA users are seeing similar improvements.

@AlCalzone
Copy link
Member Author

not yet, no.

@guineau
Copy link

guineau commented Sep 13, 2022 via email

@blhoward2
Copy link
Collaborator

No, they use different processes. OTA and OTW work entirely differently.

@RyanWor
Copy link

RyanWor commented Sep 13, 2022

You will need to update the controller itself via a PC running Simplicity Studio, at least that's how I've done it in the past.

@fisch55
Copy link

fisch55 commented Sep 13, 2022

I can only find the 7.17.2 …. No 7.18 available…. On GitHub

@AG-Teammate
Copy link

Just updated Zooz ZST10 with https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.1/protocol/z-wave/Apps/bin/gbl/zwave_ncp_serial_api_controller_BRD4207A.gbl
So far no difference but I didn't have dead nodes before

@kars85
Copy link

kars85 commented Sep 13, 2022

Just updated Zooz ZST10 with https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.1/protocol/z-wave/Apps/bin/gbl/zwave_ncp_serial_api_controller_BRD4207A.gbl So far no difference but I didn't have dead nodes before

This is the 7.18.1 firmware? Their repo organization is like the wild wild west. Spent way too much time yesterday trying to find/understand their new github structure. lol

@AG-Teammate
Copy link

Just updated Zooz ZST10 with https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.1/protocol/z-wave/Apps/bin/gbl/zwave_ncp_serial_api_controller_BRD4207A.gbl So far no difference but I didn't have dead nodes before

This is the 7.18.1 firmware? Their repo organization is like the wild wild west. Spent way too much time yesterday trying to find/understand their new github structure. lol

Yes, for this particular stick. FW: v7.18.1 SDK: v7.18.1

@fisch55
Copy link

fisch55 commented Sep 13, 2022

Just updated Zooz ZST10 with https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.1/protocol/z-wave/Apps/bin/gbl/zwave_ncp_serial_api_controller_BRD4207A.gbl So far no difference but I didn't have dead nodes before

This is the 7.18.1 firmware? Their repo organization is like the wild wild west. Spent way too much time yesterday trying to find/understand their new github structure. lol

Yes, for this particular stick. FW: v7.18.1 SDK: v7.18.1

And for Aeotec7 stick.... I'm really confused about the repo organization.
Before I was able to finde the fw files without any problem...

@alexruffell
Copy link

I believe this "release notes" document shows what BRDXXXXX matches up to what ZWAVE chip.

https://www.silabs.com/documents/public/release-notes/SRN14889-7.18.0.0.pdf

image

For the file linked in the previous post, it says : "ZGM130S: ZW-LR, SiP & 14 dBm". Anyhow I don't believe this is the right place for this topic. After all, this is a closed issue.

@macromarkman
Copy link

Anyhow I don't believe this is the right place for this topic. After all, this is a closed issue.

My bad, sorry.

@RyanWor
Copy link

RyanWor commented Sep 13, 2022

Just FYI, I spoke with Zooz support about the new firmware this morning. They just started looking at it themselves and told me they would have an update by next week. I think I'd rather wait for their official firmware than install the generic.

@blhoward2
Copy link
Collaborator

They don't have their own firmware. The file they provide has the exact same hash as the one on GitHub. All 700 firmware is the same (one for each of the two chipsets), except Zwave.me's.

@fisch55
Copy link

fisch55 commented Sep 14, 2022

May someone can show me the right one for the Aeotec 7 Stick ….

@Daniel-dev22
Copy link

This needs to be locked. This thread has numerous people still subscribed potentially, the original issue is resolved.

@blhoward2
Copy link
Collaborator

I agree. New issues can be opened for specific issues. Locking…

@zwave-js zwave-js locked as resolved and limited conversation to collaborators Sep 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet