-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wifi authentication with WIFI_ALL_CHANNEL_SCAN and multiple APs with same SSID leads to memory leak if PWD is wrong (IDFGH-10106) #11381
Comments
Hi @RungeJan , I tried this issue in our internal setup, I don't see memory going down after every iteration. It actually becomes constant after some iterations. Can you please share AP's config? Also how many APs with same SSID is present in the env? I (85591) wifi station: [up:85] Heap: free: 232180(internal: 232180) (largest block: 110592), min free: 229520 / 301568 max |
Hi, Code
Running this code provides the following log: Startuprst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0030,len:6940 ho 0 tail 12 room 4 load:0x40078000,len:15500 load:0x40080400,len:3844 0x40080400: _init at ??:?entry 0x4008064c I (0) cpu_start: App cpu up. I (887) wifi station: wifi_init_sta finished. I (3407) wifi station: ssid: our_ssid (12:34:56:78:90:AC) I (3437) wifi station: ssid: our_ssid (12:34:56:78:90:AD) I (3467) wifi station: ssid: our_ssid (12:34:56:78:90:AE) I (3497) wifi station: ssid: other1 (12:34:56:78:90:AF) I (3527) wifi station: ssid: other2 (12:34:56:78:90:BA) I (3557) wifi station: ssid: other3 (12:34:56:78:90:BB) I (3587) wifi station: ssid: other4 (12:34:56:78:90:BC) I (5637) wifi station: [up:5] Heap: free: 236412(internal: 236412) (largest block: 110592), min free: 233924 / 303432 max As you can see, 4 APs for our network were found. One can also see that the states are changed differently in my logs and in your log. your wifi states stick to the following procedure: or I (89631) wifi:state: init -> auth (b0) My states are mostly only switching between auth and init I hope this helps. |
@kapilkedawat |
Hi @RungeJan , |
Your and also my Problem is 95% related to WPA3 sae. And still present in latest Espressif:
I also have access points with the same name, so possible it is a combination of both, but unsure there. I did some test with heap trace. With Espressif ESP-IDF v4.4.7-197-g1b42fee4f0 and ESP-IDF v4.4.6-487-g2d60e58888. Both have the same result. My Wi-Fi Manager scans and finds best matches and that stuff, so we're also monitoring scans here. My experiment: Code
Reconnect log
What I observed is that the following tracing for lost heap appears more often if I did disconnect more often (proportionally I think) so this should be the leak you are searching for, logs from ESP-IDF v4.4.7-197-g1b42fee4f0:
Full heap trace from v4.4.7-197-g1b42fee4f0. 3 (Current version) Manually disconnects, about 15 connections attempts from the ESP 24 Leaks on wpa3_build_sae_commit
Sadly I did most test with the oldest version, so here are 2 logs you can compare, they are generated with: ESP-IDF v4.4.6-487-g2d60e58888 3 AP disconnects, about 15 connection attempts 28 leaks on wpa3_build_sae_commit
about 6 AP disconnects, about 30 connection attempts 48 leaks on wpa3_build_sae_commit
With new version, disabled WPA3 over menuconfig (AP no change mixed mode)
After my tests are done I always have lost > 10 KiByte of my heap. As more often I try as more memory is lost. @RungeJan It may be a workaround to disable WPA3 in your configuration if your environment supports this. If possible verify my tests. @ espressif team. If my analysis is correct this may be a Hugh problem with ESP32 in a production environment. Please hurry up to fix this. Specially if the wifi has problems or weak connection the ESP32 will lose all its memory very fast. |
After some time of running my devices with WPA3 disabled one of my devices lost a big memory chunk again, about 100 KiByte. I'm unsure where the problem comes from. But it looks very much like the original bug report in this ticket. I just got positive feedback to a feature that will help me to trace it. I will make a long time test with heap tracing as soon it is merged. Never mind, it is possible or even likely that both problems exists, the WPA3 leak and the scan leak. Should I create a second issue with the heap trace and possible WPA3 memory leak above? Best Regards |
Hi @masterxq, SAE leak issue was fixed in lDF v5.1 onward and we missed to backport it. This has been backported now and will be available in next sync. For the scan leak, are you using scan example or something else? We would appreciate if you can share more details on how to reproduce it. |
Yes, thank you and I can confirm this, tested with: The new stop heap_trace_alloc_pause() feature was very helpfully on the analysis. (0 Leaks!) The other problem most likely still exists, but it is super hard to reproduce, it only appears very rare. And it definitely appears more often when there are problems with the AccessPoints, maybe because scanning is done more frequently in this case. If it appears, big chunks of memory are being lost. Sadly I could not observe it on my test devices with heap trace. I will continue observing it for some days, and try to create Wi-Fi issues on my AccessPoints for trigger it, but possible I don't have the time to complete an active analysis :/ Unfortunately, as long as the leak has not occurred on my test devices, I cannot even guarantee that it is not a leak in the user code. However, I always try to avoid allocating memory and if I do, I check and test intensively. Perhaps there are not that many code places where larger chunks are reserved and which are directly related and a static analysis of the code could be sufficient. Of course, I realize that the information base is limited and uncertain, I hope that I can contribute even more to the solution :) |
Sorry did not completely answer your question: The memory leak I observed could be in the user code or due to incorrect use of the espressif framework. But I really can't think of where else to look because I've checked everything several times and compared it with the documentation. |
Hi @masterxq, user need to free the memory for scan after fetching the results. Please see https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/network/esp_wifi.html#_CPPv427esp_wifi_scan_get_ap_recordP16wifi_ap_record_t , I hope your code was doing that. Do let us know if you face the issue in future. |
@kapilkedawat I gave a friend some test devices that also has the issue, maybe we can reproduce the problem there :) |
Hi @masterxq, SAE leak change is merged, feel free to try that. Can we close this issue and you can reopen/open a new issue if you encounter the leak again? |
Answers checklist.
IDF version.
v5.0.2
Operating System used.
Windows
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
PowerShell
Development Kit.
ESP32-WROVER-E on Custom board
Power Supply used.
External 3.3V
What is the expected behavior?
Setup
Multiple APs with the same ssid (wifi network with mutliple repeaters, security WPA2/WPA3-Personal)
Trying (unlimited) reconnects with wifi config containing invalid passwort
wifi_config_t: sta.scan_method is set to WIFI_ALL_CHANNEL_SCAN
Expected behaviour
The program is running forever, trying to access the ssid with a wrong password, available heap is not decreasing constantly.
Code
The following adjusted version of the wifi getting started example shows the problem:
What is the actual behavior?
Actual behaviour
While running you can see the available heap decreasing constantly. Sometimes the program will crash after the heap is fully decreased, sometimes the program keeps running with a minimum of memory left but the task watchdog is triggered periodically
Steps to reproduce.
Debug Logs.
More Information.
The issue was already present on v5.0.1, I haven`t tried earlier versions yet
The text was updated successfully, but these errors were encountered: