Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wifi connection timeout #377

Open
alka79 opened this issue May 19, 2024 · 6 comments
Open

Wifi connection timeout #377

alka79 opened this issue May 19, 2024 · 6 comments

Comments

@alka79
Copy link

alka79 commented May 19, 2024

Describe you new feature you'd like

My wifi access point has sometimes it's own life. It randomly resets the 2.4G band every day or two !
There must be to many IOT devices on my wifi !
When this happens, 2.4G wifi is lost for about 40secs. All devices on my network recover almost immediately when wifi comes back, but ESPSomfy took several more minutes. That made me investigate.
It has highlighted an ESPSomfy wifi reconnection behaviour which is not optimal IMHO. Leading to some suggestions below.

The way it works as I could understand:

  • At reset, if no SSID is present in the settings, the board starts in AP mode, launching the SoftAP (hotspot). If SSID is present in the settings, as it normally is after initial setup, the board will enter in STA mode and try to connect to the set SSID to the strongest AP.
  • At any point in time, if wifi connection is lost, the board will try to reconnect.
  • Wifi connection (or reconnection) has a 20sec timeout. After that, if reconnect failed, the softAP is started (why it is actually started in AP_STA mode is unclear to me)
  • SoftAP has also a timeout. If no client connects in 3 minutes, the SoftAP is closed and the wifi connection process starts over.

So in the end, the board recovers from a temporary wifi failure but takes some extra time.

IMHO this could be improved.

  • 20 sec reconnection timeout is really short. An AP reboot takes longer.
  • Catch the 3 minutes window while the board is in SoftAP mode could be challenging !

I have done some minimal changes to the code to accommodate:

  • set the wifi connection timeout to 2 minutes instead of 20 sec. That should cover most cases.
  • use the blue builtin_led to give some visual indication of the status of the board : Led steady on during setup and when wifi is disconnected. Led blinking slowly when board is in softAP mode.
    This could be improved with deeper changes to the code, like a different timeout for first connection (short) and reconnection (long).

I understand the need to re-enter SoftAP mode in case the wifi settings must be changed. This fortunately occurs rarely and is predictable, So I don't mind to wait some extra minutes when it happens.

BTW; as I understand, an easy way to force restart in SoftAP mode is to wipe the SSID in the settings and reset. The question was once asked but not sure about the answer at that time.

@rstrouse
Copy link
Owner

Yeah I kinda wandered my way here. The timing I ended up with pretty much came from requests like these to accommodate some challenging environments. Originally, the software checked the boot button an would pop into AP mode and the LED would flash. This proved to be a no go given the myriad of boards out there that use different pins for these things. I looked really hard at preserving this but using the built_ins are a pipe dream without compiling for every board variant. Github would probably shut it down.

So perhaps it is time to revisit this again given the way wired connections work. If a connection is established with a wired connection it immediately pops out of AP mode and establishes a link. I think I may be able to do this same thing with AP_STA mode but it won't be as elegant as simply dropping the AP mode and connecting. Where this gets weird is because there is only one wifi radio and I am unsure as to what it does when changing its modes.

I really have three conditions that need to be taken care of. First is a non-existent SSID that is detected by checking whether one has been entered or that SSID does not show up in a scan. The second is where the SSID exists but it cannot issue a connection. This could be the result of AP publishing its SSID but is not ready to issue IPs yet, the passphrase has changed, or there is some general error in the link negotiation.

Currently, the SoftAP is opened when a connection cannot be met within 20 seconds. However, catching the soft AP is not as hard as you might think. If a connection is made then it will not drop the AP mode until there are 0 attached clients or the device is rebooted (at which time there will be 0 clients). This network will show up rather quickly as the ESP32 announces its AP.

@alka79
Copy link
Author

alka79 commented May 21, 2024

We have two subjects in one thread : led and wifi connection !

Arduino folks introduced LED_BUILTIN constant to accommodate for the board variants. This constant is correct on my ESP32 board but indeed it was not properly maintained over the years for all the ESP boards. You would not have to create separate builds, just add a LedPin setting to override in case led_builtin is missing or wrong.
Led is a nice to have. It is useful when we start playing with the system. Once up and running, the board will be forgotten, except by some HA automation ;) I like the visual feedback and had some fun to add the led (also a brief flash when the board sends a command). It is surely not a requirement for all users.

It really makes sense to go straight to softAP the first time, or when no SSID is in the settings.
After that, we rarely need to go to softAP again. Who changes regularly the SSID or passphrase ? and if one plans to change, one may simply modify the settings in the WebUI prior to reset the board.

My point was that automatically switch to softAP after STA-AP connection is lost for 20sec is to short IMHO.
It may happen that AP becomes not reachable for a minute or so (most probably a reboot). In that case, the board switches to softAP after 20sec and stays there for 3 minutes, waiting for a client. Then softAP is closed and it tries to connect to STA-AP again. And so on. I just tested an AP out of reach for 40 seconds: my other diy board reconnect immediately but ESPSomfy is unavailable for a total 4minutes.
It may go unnoticed by most users, but it feels suboptimal. I prefer a 2 or 3 minutes tolerance before switching to softAP.

I am not familiar with the dual AP_STA mode. It seems that ESP32 can handle both at the same time.
An elegant solution as you suggest might be to keep trying to connect to STA-AP in the background and kill softAP if STA-AP connection becomes active again and no client is connected to softAP.
Again, don't overthink this. It is so rarely needed, that it does not deserve the additional work. In addition, testing this is painful! The behavior and timing should be properly described in the doc and we can live with it.

@rstrouse
Copy link
Owner

Arduino folks introduced LED_BUILTIN constant to accommodate for the board variants.

Unfortunately that constant is a define that is used during the compile process. The board selection sets that value and must be compiled in at compile time.

An elegant solution as you suggest might be to keep trying to connect to STA-AP in the background and kill softAP if STA-AP connection becomes active again and no client is connected to softAP.

I am going to do some research with this. If this can operate the way a wired connection does then it will be very smooth. If not then I definitely think a change is in order for when the connection is simply lost.

@alka79
Copy link
Author

alka79 commented May 21, 2024

OK.

While you look at it, the APs are scanned in setup() just to print the list on the serial output. useful ? (it takes 4 to 6 seconds - doubling the setup time in my case)

There comes another curiosity question : why change to connect to wifi async and not in the setup as in 2.4.2 ?

@rstrouse
Copy link
Owner

While you look at it, the APs are scanned in setup() just to print the list on the serial output. useful ? (it takes 4 to 6 seconds - doubling the setup time in my case)

This is an artifact of the original attempts at getting to connect to the strongest AP in a mesh. The original docs indicated that simply setting the sort would make it connect properly but that wasn't the entire truth. At this point it should be removed. The other thing that it did was make sure the buffers used to scan APs were not allocated in the middle of the heap.

There comes another curiosity question : why change to connect to wifi async and not in the setup as in 2.4.2 ?

This has to do with several things. First, are delays in wired configurations. The PHY reports the link up long before it actually starts communicating. Not sure if that is a bug or if the expectation is that only after an IP is issued it should be considered connected.

Next the watchdog needs to be fed in a standard way. Moving this to an async operation allows the dog to be fed in standard places. While this appears to be fundamentally different than v2.4.2 it really isn't since the associated services are not started until a connection is established. It just gets triggered from the loop instead of created a loop in the setup. This eliminates the need to add delays and yields for other resources to get a slice.

Finally, reconnecting to wifi, changing of AP, and even fallback from a wired connection can all be done in the same fashion and there is no performance penalty for doing it this way. However, the changeAP code has yet to be turned over partly because this operation is a sub-second connection directly to the mac. Although it will be transitioned and the code will simply set the connection target and the connecting flag. This is the toughest thing to test and operating the initial connection in a way that acts like the change allows the code to operate in a similar way with similar steps.

@alka79
Copy link
Author

alka79 commented May 22, 2024

thanks for the detailed explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants