Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESP-WIFI-MESH] memory leak on IDF>=5.1.4 (IDFGH-12810) #13786

Open
3 tasks done
habablaxe opened this issue May 12, 2024 · 45 comments
Open
3 tasks done

[ESP-WIFI-MESH] memory leak on IDF>=5.1.4 (IDFGH-12810) #13786

habablaxe opened this issue May 12, 2024 · 45 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@habablaxe
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

MESH:
1 fixed root and 2 nodes.

code doesn't do anything, it just starts the mesh

if I compile code on IDF 5.1.2 - there are no problems and no memory leaks!
if I compile code on IDF 5.1.4, 5.2 or 5.2.1 - a problem appears, on nodes a memory leak occurs approximately every 9 seconds:
on node 1(ESP32) - 124 bytes
on node 2(ESP32S3) - 128 bytes

1 sec timer for debug log:

void mem_print_heap(TimerHandle_t t)
{
	uint32_t heap_size;
	uint32_t min_heap_size;
	static int cnt = 0;
	
#ifndef CONFIG_ESP32S3_SPIRAM_SUPPORT
	heap_size = esp_get_free_heap_size();
	min_heap_size = esp_get_minimum_free_heap_size();
	MDF_LOGW("Free heap, curr: %lu, min: %lu", heap_size, min_heap_size);
#else
	heap_size = heap_caps_get_free_size(MALLOC_CAP_8BIT | MALLOC_CAP_INTERNAL);
	min_heap_size = heap_caps_get_minimum_free_size(MALLOC_CAP_8BIT | MALLOC_CAP_INTERNAL);
	MDF_LOGW("Free heap, curr: %lu, min: %lu, total curr: %lu, min: %lu", heap_size, min_heap_size, esp_get_free_heap_size(), esp_get_minimum_free_heap_size());
#endif /**< CONFIG_SPIRAM_SUPPORT */
	cnt++;
	if (cnt > 4) {
		esp_wifi_statis_dump(0xffff); 
		cnt = 0;
	}
}

log for node 1:
log32-1.txt

after some time the memory returned (line 68) and the leak started again:
log32-2.txt

log without dump for node 2:
(S3 with SPIRAM), after a while I get the error:
W (1097142) wifi:m f null
logS3-1.txt

if at any time you turn off/on the power on the root(nodes are disconnected), then all memory returns on the all nodes:
logS3-2.txt
and the leakage process starts all over again.

sdkconfig32.txt
sdkconfigS3.txt

@espressif-bot espressif-bot added the Status: Opened Issue is new label May 12, 2024
@github-actions github-actions bot changed the title [ESP-WIFI-MESH] memory leak on IDF>=5.1.4 [ESP-WIFI-MESH] memory leak on IDF>=5.1.4 (IDFGH-12810) May 12, 2024
@zhangyanjiaoesp
Copy link
Collaborator

@habablaxe
Can this issue be reproduced using the IDF mesh example? Can you provide the log on IDF v5.1.2?

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels May 13, 2024
@habablaxe
Copy link
Author

@zhangyanjiaoesp

Can this issue be reproduced using the IDF mesh example?

I haven’t tested it on a mesh example ((

Can you provide the log on IDF v5.1.2?

log32_5.1.2.txt

I'm using MDF(components mwifi and mupgrade only) and my code works with IDF 4.4.5, 5.0.3, 5.1, 5.1.1, 5.1.2. with uptime of more than several months
As a new release of the IDF version, I switch to it.
The first problems arose on 5.2 and 5.2.1, and a few days ago 5.1.4 was released with the same problem

I don’t think the problem is in my code, it doesn’t do anything (start mesh and debug timer) or in MDF (it’s just a wrapper), because the memory is completely returned when a node disconnects/connects

@habablaxe
Copy link
Author

just tested on 5.1.3, same problem:

W (626851) [APPMAIN, 50]: Free heap, curr: 96748, min: 92080
W (627851) [APPMAIN, 50]: Free heap, curr: 96748, min: 92080
W (628851) [APPMAIN, 50]: Free heap, curr: 96748, min: 92080
W (629851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (630851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (631851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (632851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (633851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (634851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (635851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (636851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (637851) [APPMAIN, 50]: Free heap, curr: 96624, min: 92080
W (638851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (639851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (640851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (641851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (642851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (643851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (644851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (645851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (646851) [APPMAIN, 50]: Free heap, curr: 96500, min: 92080
W (647851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (648851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (649851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (650851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (651851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (652851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (653851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (654851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (655851) [APPMAIN, 50]: Free heap, curr: 96376, min: 92080
W (656851) [APPMAIN, 50]: Free heap, curr: 96252, min: 92080
W (657851) [APPMAIN, 50]: Free heap, curr: 96252, min: 92080
W (36808) [APPMAIN, 54]: Free heap, curr: 155515, min: 153967, total curr: 2130040, min: 2126572
W (37808) [APPMAIN, 54]: Free heap, curr: 155515, min: 153967, total curr: 2130040, min: 2126572
W (38808) [APPMAIN, 54]: Free heap, curr: 155515, min: 153967, total curr: 2130040, min: 2126572
W (39808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (40808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (41808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (42808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (43808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (44808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (45808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (46808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (47808) [APPMAIN, 54]: Free heap, curr: 155387, min: 153967, total curr: 2129912, min: 2126572
W (48808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (49808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (50808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (51808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (52808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (53808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (54808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (55808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (56808) [APPMAIN, 54]: Free heap, curr: 155259, min: 153967, total curr: 2129784, min: 2126572
W (57808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (58808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (59808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (60808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (61808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (62808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (63808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (64808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (65808) [APPMAIN, 54]: Free heap, curr: 155131, min: 153967, total curr: 2129656, min: 2126572
W (66808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (67808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (68808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (69808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (70808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (71808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572
W (72808) [APPMAIN, 54]: Free heap, curr: 155003, min: 153967, total curr: 2129528, min: 2126572

@cbaechler
Copy link
Contributor

@habablaxe @zhangyanjiaoesp

I am currently observing a similar issue with our FW after updating to release/v5.2 branch (before we were still using release/v4.4). We are exclusively using ESP32-S3 devices in our project, no mix.

In our case, we observe the following:

  • The mesh works for some time, then suddenly starts to leak memory in pieces of 128 bytes (the actual allocation seems to be 124 bytes however). I did not find out what the trigger is that causes the start of the memory leak.
  • Allocation seems to happen in ~10 seconds intervals.
  • We only observe this on non-root devices.
  • After some point, we observe W (2662884) wifi:m f null which I assume is a side-effect because the wifi cannot allocate memory anymore.

I tried to locate where the allocation is done using esp_backtrace_print, here is the stack trace:

W (113924) alloc: 0x3fcc7438 allocated 124 bytes

Backtrace: 0x403780AB:0x3FCD9680 0x40376A85:0x3FCD96A0 0x40376AE4:0x3FCD96C0 0x40377F95:0x3FCD96E0 0x421878D1:0x3FCD9700 0x40391E39:0x3FCD9730 0x400398B5:0x3FCD9760 0x421329F1:0x3FCD9780 0x4212F131:0x3FCD97A0 |<-CONTINUES
0x403780ab: esp_heap_trace_alloc_hook at .../firmware/main/main.c:367
0x40376a85: heap_caps_malloc_base at .../esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376ae4: heap_caps_malloc at .../esp-idf/components/heap/heap_caps.c:202
0x40377f95: esp_coex_common_malloc_internal_wrapper at .../esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x421878d1: esf_buf_alloc_dynamic at ??:?
0x40391e39: esf_buf_alloc at ??:?
0x400398b5: ic_ebuf_alloc in ROM
0x421329f1: ieee80211_getmgtframe at ??:?
0x4212f131: ieee80211_encap_null_data at ??:?

I (08:15:18.446) heap: free_internal_heap: 47191 bytes, delta: -128 bytes

FYI @KonssnoK

@KonssnoK
Copy link
Contributor

KonssnoK commented May 14, 2024

@zhangyanjiaoesp could you tell us when the log wifi m f null is generated?

@Cirobar
Copy link

Cirobar commented May 14, 2024

I'm currently using the esp-mesh ip_internal_network example on ESP-IDF 5.2 and I actually have observed this same behaviour at the non root nodes from the mesh. I'm monitoring the heap memory and using wifi_statis_dump(0xffff) as well, the nodes oftenly leaks memory, don't know exactly why. It seems that from time to time this leaked memory, is recovered, until it is not recovered anymore, then it keeps dropping until this warning "wifi: m f null" appears without signaling wifi disconnection, which is very strange, especially because it kept publishing on MQTT for a while after that (about 1 hour or even more) and after this warning the heap memory kept very constant without recoverying or dropping.

@zhangyanjiaoesp
Copy link
Collaborator

@zhangyanjiaoesp could you tell us when the log wifi m f null is generated?

When the null frame malloc fail, it will return m f null. Its possible causes are: slow packet sending; memory leak. I will check it.

@cbaechler
Copy link
Contributor

cbaechler commented May 15, 2024

I also tried to reproduce with the ip_internal_network example on a ESP32-S3 devkit, and this

It seems that from time to time this leaked memory, is recovered, until it is not recovered anymore, then it keeps dropping until this warning "wifi: m f null" appears without signaling wifi disconnection

exactly describes what I also observed.

Based on the logs, my assumption would be that the allocation is somehow related to the moment where this log
mesh: 5170<active>parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0 is printed.

In the below log we also see multiple allocations of 128 bytes that seem to get released after some time all together, however, sometimes this releasing mechanism seems not to work (and then we end up seeing the wifi: m f null log).

I (1135659) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1182749) heap: free_internal_heap: 218063 bytes, delta: 0 bytes
I (1212829) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:10790ms, retries:1<>
I (1223879) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1223879) alloc: 0x3fcc9258 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B80 0x403760E2:0x3FCB3BA0 0x40376109:0x3FCB3BD0 0x40378ECC:0x3FCB3BF0 0x420AF655:0x3FCB3C10 0x40385929:0x3FCB3C40 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1224749) heap: free_internal_heap: 217935 bytes, delta: -128 bytes
W (1225399) alloc: 0x3fcc92d8 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B30 0x403760E2:0x3FCB3B50 0x40376109:0x3FCB3B80 0x40378ECC:0x3FCB3BA0 0x420AF655:0x3FCB3BC0 0x40385929:0x3FCB3BF0 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1225409) mesh: 5969<scan>parent layer:1, rssi:-6, assoc:1(cnx rssi threshold:-120)
I (1225419) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][ae:8b:a9:39:4e:83]<weak>
I (1225429) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1225439) mesh: 716[monitor]no change, parent:7c:df:a1:e1:c9:e1, rssi:-6
I (1225439) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:7445ms, retries:2<>
I (1225749) heap: free_internal_heap: 217807 bytes, delta: -128 bytes
I (1233079) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1233079) alloc: 0x3fcc9358 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B80 0x403760E2:0x3FCB3BA0 0x40376109:0x3FCB3BD0 0x40378ECC:0x3FCB3BF0 0x420AF655:0x3FCB3C10 0x40385929:0x3FCB3C40 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1233749) heap: free_internal_heap: 217679 bytes, delta: -128 bytes
W (1234599) alloc: 0x3fcc93d8 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B30 0x403760E2:0x3FCB3B50 0x40376109:0x3FCB3B80 0x40378ECC:0x3FCB3BA0 0x420AF655:0x3FCB3BC0 0x40385929:0x3FCB3BF0 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1234609) mesh: 5969<scan>parent layer:1, rssi:-6, assoc:1(cnx rssi threshold:-120)
I (1234609) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][ae:8b:a9:39:4e:83]<weak>
I (1234619) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1234629) mesh: 716[monitor]no change, parent:7c:df:a1:e1:c9:e1, rssi:-6
I (1234639) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:9006ms, retries:3<>
I (1234749) heap: free_internal_heap: 217551 bytes, delta: -128 bytes
I (1243869) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1243869) alloc: 0x3fcc9458 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B80 0x403760E2:0x3FCB3BA0 0x40376109:0x3FCB3BD0 0x40378ECC:0x3FCB3BF0 0x420AF655:0x3FCB3C10 0x40385929:0x3FCB3C40 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1244749) heap: free_internal_heap: 217423 bytes, delta: -128 bytes
W (1245389) alloc: 0x3fcc94d8 allocated 124 bytes

Backtrace: 0x40377097:0x3FCB3B30 0x403760E2:0x3FCB3B50 0x40376109:0x3FCB3B80 0x40378ECC:0x3FCB3BA0 0x420AF655:0x3FCB3BC0 0x40385929:0x3FCB3BF0 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

I (1245399) mesh: 5969<scan>parent layer:1, rssi:-6, assoc:1(cnx rssi threshold:-120)
I (1245409) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][ae:8b:a9:39:4e:83]<weak>
I (1245419) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1245429) mesh: 716[monitor]no change, parent:7c:df:a1:e1:c9:e1, rssi:-6
I (1245429) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:409888ms, retries:3<>
I (1245749) heap: free_internal_heap: 218191 bytes, delta: 768 bytes
I (1260589) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>

@KonssnoK
Copy link
Contributor

@zhangyanjiaoesp could you tell us when the log wifi m f null is generated?

When the null frame malloc fail, it will return m f null. Its possible causes are: slow packet sending; memory leak. I will check it.

thanks, we wait for updates then :)

@zhangyanjiaoesp
Copy link
Collaborator

@cbaechler @KonssnoK @habablaxe
When the device connects to mesh network, the non-root node will passively scan the current channel from time to time, and then select a better parent (if there is). In this process, the node will malloc memory to store AP information. After the parent node switch process is complete (continue with the current parent, or switch to a better one), the memory is released. I have checked the related code, it seems no problem. I think the m f null issue is not related to this.

I have tested with the ip_internal_network mesh example, and I didn't reproduce the m f null issue. Can you provide a demo for me to reproduce this issue?

@cbaechler
Copy link
Contributor

Thanks for your comment.

the non-root node will passively scan the current channel from time to time

Does the timing match with what is reported in this ticket? Does the scanning happen every 9-10 seconds?

After the parent node switch process is complete (continue with the current parent, or switch to a better one), the memory is released.

I understand deallocation happens after some time based when the switch process is complete. For sure this works in all nominal cases. Still, do you see any possible chance that the free would not happen?

I have tested with the ip_internal_network mesh example, and I didn't reproduce the m f null issue. Can you provide a demo for me to reproduce this issue?

I think the m f null log might only be a side effect of what is happening, since basically it tells that the allocation failed. What I observed previously is the non-root device doing multiple allocations of 124 bytes (or 128 bytes) up to a point where the m f null appears due to exhausted heap memory.

I did not yet reproduce m f null in the ip_internal_network (but I did so with our own code, which uses more heap memory), however I did observe the multiple allocations behavior. I am still trying to reproduce it, one hope that have is to reproduce it by reducing the heap memory before even starting the mesh.

However, it seems @Cirobar has seen the m f null log in the ip_internal_network example (see #13786 (comment)). Maybe you can give more insights from your experience?

@zhangyanjiaoesp
Copy link
Collaborator

Does the timing match with what is reported in this ticket? Does the scanning happen every 9-10 seconds?

@cbaechler The scan interval has a short timer and a long timer, the short timer is random between 4.5s - 12.9s, and the long timer is random between 5min ~8min. Every three short intervals is followed by a long interval.

@KonssnoK
Copy link
Contributor

@zhangyanjiaoesp would have a way to do a diff between mesh implementation in 5.1.2 and 5.1.4 ?
The initial comment tells us that this issue was introduced at a specific point, so it would be quite easy to see if something related to memory management was changed for mesh libraries.

@Cirobar
Copy link

Cirobar commented May 21, 2024

However, it seems @Cirobar has seen the m f null log in the ip_internal_network example (see #13786 (comment)). Maybe you can give more insights from your experience?

Well, I've managed to retrieve a captured log, but unfortunately at the time I wasn't using wifi_statis_dump() or heap tracing. Moreover, there is additional tasks running besides the ip_internal_network example tasks, none of them causes any kind of leakage for sure. A cut of this log is attached below, since it is very extense. Anyway, the leakage can be seen as the tasks are printing the current heap memory. Also, as I've said before, the "wifi: m f null" warning happens often (not on every run of the app) and sometimes when it does actually happen, the application is not interrupted. One thing I've noticed is that when there is a very stable router or maybe the components are closer to the router, not sure yet of each one, the "wifi: m f null" may not happen at all.

esp32_nonroot_log.txt

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK I will check the difference between 5.1.2 and 5.1.4.
@cbaechler Can you show me your test code for the following log?I know you used ip_internal_network example, but please tell me what changes you added. Thanks.

I (1135659) mesh: 5130parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1182749) heap: free_internal_heap: 218063 bytes, delta: 0 bytes
I (1212829) mesh: 2012parent monitor, my layer:2(cap:6)(node), interval:10790ms, retries:1<>
I (1223879) mesh: 5170parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1223879) alloc: 0x3fcc9258 allocated 124 bytes
Backtrace: 0x40377097:0x3FCB3B80 0x403760E2:0x3FCB3BA0 0x40376109:0x3FCB3BD0 0x40378ECC:0x3FCB3BF0 0x420AF655:0x3FCB3C10 0x40385929:0x3FCB3C40 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?
I (1224749) heap: free_internal_heap: 217935 bytes, delta: -128 bytes
W (1225399) alloc: 0x3fcc92d8 allocated 124 bytes
Backtrace: 0x40377097:0x3FCB3B30 0x403760E2:0x3FCB3B50 0x40376109:0x3FCB3B80 0x40378ECC:0x3FCB3BA0 0x420AF655:0x3FCB3BC0 0x40385929:0x3FCB3BF0 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?
I (1225409) mesh: 5969parent layer:1, rssi:-6, assoc:1(cnx rssi threshold:-120)
I (1225419) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][ae:8b:a9:39:4e:83]
I (1225429) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1225439) mesh: 716[monitor]no change, parent:7c:df:a1:e1:c9:e1, rssi:-6
I (1225439) mesh: 2012parent monitor, my layer:2(cap:6)(node), interval:7445ms, retries:2<>
I (1225749) heap: free_internal_heap: 217807 bytes, delta: -128 bytes
I (1233079) mesh: 5170parent layer:1(node), channel:11, rssi:-8, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1233079) alloc: 0x3fcc9358 allocated 124 bytes
Backtrace: 0x40377097:0x3FCB3B80 0x403760E2:0x3FCB3BA0 0x40376109:0x3FCB3BD0 0x40378ECC:0x3FCB3BF0 0x420AF655:0x3FCB3C10 0x40385929:0x3FCB3C40 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?
I (1233749) heap: free_internal_heap: 217679 bytes, delta: -128 bytes
W (1234599) alloc: 0x3fcc93d8 allocated 124 bytes
Backtrace: 0x40377097:0x3FCB3B30 0x403760E2:0x3FCB3B50 0x40376109:0x3FCB3B80 0x40378ECC:0x3FCB3BA0 0x420AF655:0x3FCB3BC0 0x40385929:0x3FCB3BF0 |<-CONTINUES
0x40377097: esp_heap_trace_alloc_hook at C:/repos/v3-5.2/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:469
0x403760e2: heap_caps_malloc_base at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376109: heap_caps_malloc at C:/repos/v3-5.2/esp-idf/components/heap/heap_caps.c:202
0x40378ecc: esp_coex_common_malloc_internal_wrapper at C:/repos/v3-5.2/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af655: esf_buf_alloc_dynamic at ??:?
0x40385929: esf_buf_alloc at ??:?

@cbaechler
Copy link
Contributor

Hi @zhangyanjiaoesp,

these are the changes I applied to examples/mesh/ip_internal_network/main/mesh_main.c:

diff --git a/examples/mesh/ip_internal_network/main/mesh_main.c b/examples/mesh/ip_internal_network/main/mesh_main.c
index efa4f185a8..bee1873031 100644
--- a/examples/mesh/ip_internal_network/main/mesh_main.c
+++ b/examples/mesh/ip_internal_network/main/mesh_main.c
@@ -17,6 +17,8 @@
 #include "mesh_netif.h"
 #include "driver/gpio.h"
 #include "freertos/semphr.h"
+#include "esp_timer.h"
+#include "esp_debug_helpers.h"
 
 /*******************************************************
  *                Macros
@@ -33,7 +35,7 @@
  *                Constants
  *******************************************************/
 static const char *MESH_TAG = "mesh_main";
-static const uint8_t MESH_ID[6] = { 0x77, 0x77, 0x77, 0x77, 0x77, 0x76};
+static const uint8_t MESH_ID[6] = { 0xCB, 0xCB, 0xCB, 0xCB, 0xCB, 0x75};
 
 /*******************************************************
  *                Variable Definitions
@@ -46,6 +48,8 @@ static mesh_addr_t s_route_table[CONFIG_MESH_ROUTE_TABLE_SIZE];
 static int s_route_table_size = 0;
 static SemaphoreHandle_t s_route_table_lock = NULL;
 static uint8_t s_mesh_tx_payload[CONFIG_MESH_ROUTE_TABLE_SIZE*6+1];
+static bool enable_hook = false;
+static esp_timer_handle_t periodic_timer;
 
 
 /*******************************************************
@@ -148,6 +152,9 @@ void esp_mesh_mqtt_task(void *arg)
     mesh_data_t data;
     esp_err_t err;
     mqtt_app_start();
+
+    enable_hook = true;
+
     while (is_running) {
         asprintf(&print, "layer:%d IP:" IPSTR, esp_mesh_get_layer(), IP2STR(&s_current_ip));
         ESP_LOGV(MESH_TAG, "Tried to publish %s", print);
@@ -381,6 +388,21 @@ void ip_event_handler(void *arg, esp_event_base_t event_base,
     esp_mesh_comm_mqtt_task_start();
 }
 
+// called 1x per second
+static void on_timer(void* arg)
+{
+    static int count_heap_log = 0;
+    static int last_free_internal_heap = 0;
+    int free_internal_heap = esp_get_free_internal_heap_size();
+
+    if ((last_free_internal_heap != free_internal_heap) || (count_heap_log > 50)) {
+        ESP_LOGI("heap", "free_internal_heap: %d bytes, delta: %d bytes", free_internal_heap, free_internal_heap - last_free_internal_heap);
+        last_free_internal_heap = free_internal_heap;
+        count_heap_log = 0;
+    } else {
+        count_heap_log++;
+    }
+}
 
 void app_main(void)
 {
@@ -425,4 +447,36 @@ void app_main(void)
     ESP_ERROR_CHECK(esp_mesh_start());
     ESP_LOGI(MESH_TAG, "mesh starts successfully, heap:%" PRId32 ", %s",  esp_get_free_heap_size(),
              esp_mesh_is_root_fixed() ? "root fixed" : "root not fixed");
+
+    ESP_LOGI("heap", "free_internal_heap: %ld bytes before drain loop", esp_get_free_internal_heap_size());
+    while (esp_get_free_internal_heap_size() > 70000) {
+        void* mem = heap_caps_malloc(2048, MALLOC_CAP_INTERNAL);
+        if (!mem) {
+            ESP_LOGE(MESH_TAG, "Failed to alloc heap");
+            break;
+        }
+    }
+    ESP_LOGI("heap", "free_internal_heap: %ld bytes after drain loop", esp_get_free_internal_heap_size());
+
+    /* create timer to track heap */
+    const esp_timer_create_args_t periodic_timer_args = {
+        .callback = &on_timer,
+        .name = "periodic",
+    };
+
+    ESP_ERROR_CHECK(esp_timer_create(&periodic_timer_args, &periodic_timer));
+    ESP_ERROR_CHECK(esp_timer_start_periodic(periodic_timer, 1000000));
+}
+
+void esp_heap_trace_alloc_hook(void* ptr, size_t size, uint32_t caps)
+{
+    if (enable_hook) {
+        if ((caps & MALLOC_CAP_INTERNAL) != 0) {
+            if ((size == 124) || (size == 128)) {
+                ESP_EARLY_LOGW("alloc", "0x%08x allocated %d bytes", ptr, size);
+                // ESP_LOGW("alloc", "0x%08x allocated %d bytes", ptr, size);
+                esp_backtrace_print(5);
+            }
+        }
+    }
 }

Changes in sdkconfig are as follows, I enabled SPIRAM and configured allocations for wifi and mbedtls to happen on the external RAM in order to reduce allocations in the internal RAM:

CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_COMPILER_OPTIMIZATION_SIZE=y
CONFIG_SPIRAM=y
CONFIG_SPIRAM_MODE_OCT=y
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=2048
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP=y
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=10
CONFIG_ESP_WIFI_RX_BA_WIN=6
CONFIG_FREERTOS_ENABLE_BACKWARD_COMPATIBILITY=y
CONFIG_HEAP_USE_HOOKS=y
CONFIG_LWIP_L2_TO_L3_COPY=y
CONFIG_LWIP_IP_FORWARD=y
CONFIG_LWIP_IPV4_NAPT=y
CONFIG_LWIP_TCP_MSS=624
CONFIG_LWIP_TCP_OOSEQ_MAX_PBUFS=4
CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC=y

Thank you!

@zhangyanjiaoesp
Copy link
Collaborator

@cbaechler @KonssnoK @Cirobar @habablaxe
Have test with the mesh example again, I can confirm that the 124 bytes is for the NULL data. The NULL data will be sent before scan start and after scan stop, and the scanning time is 1500ms. And the heap will be freed after the NULL data is sent out.
See the following logs:

The log I (19448) mesh: 5157<active>parent layer:1(node), channel:5, rssi:-8, assoc:0(cnx rssi threshold:-120)my_assoc:0 shows the scan will start, then the NULL data alloc 124 bytes, the pointer is 0x3fceca50, see log W (19478) alloc: 0x3fceca50 allocated 124 bytes. After it is sent out, the pointer is freed, see log W (19658) free: 0x3fceca50 free.

The log I (21038) mesh: 5948<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120) shows the scan is stop, before it, the NULL data alloc 124 bytes, the pointer is still 0x3fceca50, after it is sent out, the pointer is freed.

The scanning time is 21038-19478 = 1560ms.

Because the number of memory can be applied for NULL data is limited, and the length of TX queue is limited, if there are too many packets to be sent in the TX queue, and the surrounding environment is poor ,it's difficult to send packets, the applied memory can't be freed in time, then m f null will occur. That's why @Cirobar say One thing I've noticed is that when there is a very stable router or maybe the components are closer to the router, not sure yet of each one, the "wifi: m f null" may not happen at all. in #13786 (comment).

And there is no different between 5.1.2 and 5.1.4 in handling this.

I (18338) mesh_main: Received Routing table [0] 84:f7:03:80:01:88
I (18338) mesh_main: Received Routing table [1] 84:f7:03:80:01:40
I (18508) mesh_main: Tried to publish layer:2 IP:10.0.0.2
I (18518) mesh_mqtt: sent publish returned msg_id=26502
I (18798) mesh_mqtt: MQTT_EVENT_PUBLISHED, msg_id=26502
I (19448) mesh: 5157<active>parent layer:1(node), channel:5, rssi:-8, assoc:0(cnx rssi threshold:-120)my_assoc:0
>>>pm_off_channel,1317
>>>pm_send_nullfunc,997
>>>ieee80211_pm_tx_null_process,5444
>>>ieee80211_encap_null_data,5373
>>>esf_buf_alloc,274, type = 3, len = 24
>>>esf_buf_alloc_dynamic,134, type = 3, all = 124(96+24+0+4)
W (19478) alloc: 0x3fceca50 allocated 124 bytes


Backtrace: 0x403787D7:0x3FCB4F50 0x40376545:0x3FCB4F70 0x403765A5:0x3FCB4F90 0x40377295:0x3FCB4FB0 0x403773A9:0x3FCB5040 0x4037A201:0x3FCB5060 0x420BF316:0x3FCB5080 0x40385D0A:0x3FCB50C0 0x400398B5:0x3FCB50F0 |<-CORRUPTED
0x403787d7: esp_heap_trace_alloc_hook at /home/zhangyanjiao/work/code/idf1/examples/mesh/ip_internal_network/main/mesh_main.c:480 (discriminator 3)
0x40376545: heap_caps_malloc_base at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:178 (discriminator 1)
0x403765a5: heap_caps_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:197
0x40377295: trace_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:95
0x403773a9: __wrap_heap_caps_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:184
0x4037a201: esp_coex_common_malloc_internal_wrapper at /home/zhangyanjiao/work/code/idf1/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420bf316: esf_buf_alloc_dynamic at ??:?
0x40385d0a: esf_buf_alloc at ??:?
0x400398b5: ic_ebuf_alloc in ROM

W (19658) free: 0x3fceca50 free

Backtrace: 0x40378819:0x3FCB51D0 0x40376792:0x3FCB51F0 0x403772DB:0x3FCB5210 0x4037735D:0x3FCB5260 0x40385739:0x3FCB5280 0x420BFB61:0x3FCB52A0 0x40385381:0x3FCB52C0 0x403808D5:0x3FCB52F0
0x40378819: esp_heap_trace_free_hook at /home/zhangyanjiao/work/code/idf1/examples/mesh/ip_internal_network/main/mesh_main.c:492 (discriminator 3)
0x40376792: heap_caps_free at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:390 (discriminator 1)
0x403772db: trace_free at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:119
0x4037735d: __wrap_free at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:164
0x40385739: esf_buf_recycle at ??:?
0x420bfb61: ppProcTxDone at ??:?
0x40385381: ppTask at ??:?
0x403808d5: vPortTaskWrapper at /home/zhangyanjiao/work/code/idf1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162


I (19978) heap: free_internal_heap: 42411 bytes, delta: -36 bytes
I (20518) mesh_main: Tried to publish layer:2 IP:10.0.0.2
I (20518) mesh_main: Received Routing table [0] 84:f7:03:80:01:88
I (20518) mesh_main: Received Routing table [1] 84:f7:03:80:01:40
I (20518) mesh_mqtt: sent publish returned msg_id=2430
I (20788) mesh_mqtt: MQTT_EVENT_PUBLISHED, msg_id=2430
>>>pm_send_nullfunc,997
>>>ieee80211_pm_tx_null_process,5444
>>>ieee80211_encap_null_data,5373
>>>esf_buf_alloc,274, type = 3, len = 24
>>>esf_buf_alloc_dynamic,134, type = 3, all = 124(96+24+0+4)
W (21008) alloc: 0x3fceca50 allocated 124 bytes


Backtrace: 0x403787D7:0x3FCB4F30 0x40376545:0x3FCB4F50 0x403765A5:0x3FCB4F70 0x40377295:0x3FCB4F90 0x403773A9:0x3FCB5020 0x4037A201:0x3FCB5040 0x420BF316:0x3FCB5060 0x40385D0A:0x3FCB50A0 0x400398B5:0x3FCB50D0 |<-CORRUPTED
0x403787d7: esp_heap_trace_alloc_hook at /home/zhangyanjiao/work/code/idf1/examples/mesh/ip_internal_network/main/mesh_main.c:480 (discriminator 3)
0x40376545: heap_caps_malloc_base at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:178 (discriminator 1)
0x403765a5: heap_caps_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:197
0x40377295: trace_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:95
0x403773a9: __wrap_heap_caps_malloc at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:184
0x4037a201: esp_coex_common_malloc_internal_wrapper at /home/zhangyanjiao/work/code/idf1/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420bf316: esf_buf_alloc_dynamic at ??:?
0x40385d0a: esf_buf_alloc at ??:?
0x400398b5: ic_ebuf_alloc in ROM


I (21038) mesh: 5948<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)

W (21038) free: 0x3fceca50 free

Backtrace: 0x40378819:0x3FCB51D0 0x40376792:0x3FCB51F0 0x403772DB:0x3FCB5210 0x4037735D:0x3FCB5260 0x40385739:0x3FCB5280 0x420BFB61:0x3FCB52A0 0x40385381:0x3FCB52C0 0x403808D5:0x3FCB52F0
0x40378819: esp_heap_trace_free_hook at /home/zhangyanjiao/work/code/idf1/examples/mesh/ip_internal_network/main/mesh_main.c:492 (discriminator 3)
0x40376792: heap_caps_free at /home/zhangyanjiao/work/code/idf1/components/heap/heap_caps.c:390 (discriminator 1)
0x403772db: trace_free at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:119
0x4037735d: __wrap_free at /home/zhangyanjiao/work/code/idf1/components/heap/include/heap_trace.inc:164
0x40385739: esf_buf_recycle at ??:?
0x420bfb61: ppProcTxDone at ??:?
0x40385381: ppTask at ??:?
0x403808d5: vPortTaskWrapper at /home/zhangyanjiao/work/code/idf1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

I (21068) mesh: [SCAN][ch:5]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][ac:84:c6:7e:31:2b]<weak>
I (21078) mesh: 7370[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (21088) mesh: 716[monitor]no change, parent:84:f7:03:80:01:89, rssi:-8
I (21088) mesh: 2004<arm>parent monitor, my layer:2(cap:6)(node), interval:7359ms, retries:2<>
I (22528) mesh_main: Tried to publish layer:2 IP:10.0.0.2
I (22548) mesh_main: Received Routing table [0] 84:f7:03:80:01:88
I (22548) mesh_main: Received Routing table [1] 84:f7:03:80:01:40
I (22578) mesh_mqtt: sent publish returned msg_id=16618
I (22848) mesh_mqtt: MQTT_EVENT_PUBLISHED, msg_id=16618
I (24568) mesh_main: Received Routing table [0] 84:f7:03:80:01:88
I (24568) mesh_main: Received Routing table [1] 84:f7:03:80:01:40
I (24578) mesh_main: Tried to publish layer:2 IP:10.0.0.2
I (24588) mesh_mqtt: sent publish returned msg_id=56594

@KonssnoK
Copy link
Contributor

hello @zhangyanjiaoesp , thanks for the answer.
The issue we see dries up the whole heap without releasing, we'll try to stress out the example more to replicate

@KonssnoK
Copy link
Contributor

@zhangyanjiaoesp i'm using the same code that @cbaechler gave.

you said that such allocations should be freed as soon as the packet is sent the second time, but in my logs i do not see such thing.
am i missing something?



I (1191289) mesh: 5969<scan>parent layer:1, rssi:-6, assoc:1(cnx rssi threshold:-120)
I (1191289) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1191299) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1191309) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-6
I (1191319) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:317426ms, retries:3<>
I (1191539) heap: free_internal_heap: 47023 bytes, delta: -128 bytes
I (1243539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1257479) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1295539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1347539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1382399) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1399539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1451539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1503539) heap: free_internal_heap: 47023 bytes, delta: 0 bytes
I (1506299) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1516369) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:6082ms, retries:1<>
I (1522599) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1522599) alloc: 0x3fcaa530 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1523539) heap: free_internal_heap: 46895 bytes, delta: -128 bytes
W (1524119) alloc: 0x3fcaa5b0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1524129) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (1524139) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1524149) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1524159) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (1524159) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:11346ms, retries:2<>
I (1524539) heap: free_internal_heap: 46767 bytes, delta: -128 bytes
I (1535789) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1535789) alloc: 0x3fcaa630 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1536539) heap: free_internal_heap: 46639 bytes, delta: -128 bytes
W (1537309) alloc: 0x3fcaa6b0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1537319) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (1537329) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1537339) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1537349) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (1537349) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:6840ms, retries:3<>
I (1537539) heap: free_internal_heap: 46511 bytes, delta: -128 bytes
I (1544369) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1544369) alloc: 0x3fcaa730 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1544539) heap: free_internal_heap: 46383 bytes, delta: -128 bytes
W (1545889) alloc: 0x3fcaa7b0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1545899) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (1545899) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1545919) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1545919) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (1545929) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:418063ms, retries:3<>
I (1546539) heap: free_internal_heap: 46255 bytes, delta: -128 bytes
I (1598539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1631219) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1650539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1687539) heap: free_internal_heap: 46191 bytes, delta: -64 bytes
I (1688539) heap: free_internal_heap: 46255 bytes, delta: 64 bytes
I (1740539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1756149) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1792539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1844539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1881069) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (1896539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1948539) heap: free_internal_heap: 46255 bytes, delta: 0 bytes
I (1974039) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:11011ms, retries:1<>
I (1985309) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1985309) alloc: 0x3fcaa830 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1985539) heap: free_internal_heap: 46127 bytes, delta: -128 bytes
W (1986829) alloc: 0x3fcaa8b0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1986839) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (1986849) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1986859) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1986869) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (1986869) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:6182ms, retries:2<>
I (1987539) heap: free_internal_heap: 45999 bytes, delta: -128 bytes
I (1993209) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (1993219) alloc: 0x3fcaa930 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1993539) heap: free_internal_heap: 45871 bytes, delta: -128 bytes
W (1994729) alloc: 0x3fcaa9b0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (1994739) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (1994749) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (1994759) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (1994769) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (1994779) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:8874ms, retries:3<>
I (1995539) heap: free_internal_heap: 45743 bytes, delta: -128 bytes
I (2003869) mesh: 5170<active>parent layer:1(node), channel:11, rssi:-7, assoc:1(cnx rssi threshold:-120)my_assoc:0
W (2003869) alloc: 0x3fcaaa30 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB3010 0x40375FEE:0x3FCB3030 0x40376015:0x3FCB3060 0x40378D88:0x3FCB3080 0x420AF6A1:0x3FCB30A0 0x40384EF1:0x3FCB30D0 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (2004539) heap: free_internal_heap: 45615 bytes, delta: -128 bytes
W (2005389) alloc: 0x3fcaaab0 allocated 124 bytes


Backtrace: 0x40376FA3:0x3FCB2FC0 0x40375FEE:0x3FCB2FE0 0x40376015:0x3FCB3010 0x40378D88:0x3FCB3030 0x420AF6A1:0x3FCB3050 0x40384EF1:0x3FCB3080 |<-CONTINUES
0x40376fa3: esp_heap_trace_alloc_hook at C:/src/esp-idf/examples/mesh/ip_internal_network/main/mesh_main.c:478
0x40375fee: heap_caps_malloc_base at C:/src/esp-idf/components/heap/heap_caps.c:183 (discriminator 1)
0x40376015: heap_caps_malloc at C:/src/esp-idf/components/heap/heap_caps.c:202
0x40378d88: esp_coex_common_malloc_internal_wrapper at C:/src/esp-idf/components/esp_coex/esp32s3/esp_coex_adapter.c:128
0x420af6a1: esf_buf_alloc_dynamic at ??:?
0x40384ef1: esf_buf_alloc at ??:?


I (2005399) mesh: 5969<scan>parent layer:1, rssi:-8, assoc:1(cnx rssi threshold:-120)
I (2005409) mesh: [SCAN][ch:11]AP:1, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:2,i:2][d2:0f:7d:8a:8b:29]<weak>
I (2005419) mesh: 7391[weak]try rssi_threshold:-120, backoff times:0, max:5<-78,-82,-85>
I (2005429) mesh: 716[monitor]no change, parent:7c:df:a1:e2:a3:05, rssi:-8
I (2005429) mesh: 2012<arm>parent monitor, my layer:2(cap:6)(node), interval:301974ms, retries:3<>
I (2005539) heap: free_internal_heap: 45487 bytes, delta: -128 bytes
I (2005989) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (2057539) heap: free_internal_heap: 45487 bytes, delta: 0 bytes
I (2109539) heap: free_internal_heap: 45487 bytes, delta: 0 bytes
I (2129899) mesh: 5130<assoc>parent layer:1, channel:11, rssi:-5, assoc:1, rssi threshold<-78,-82,-85>
I (2161539) heap: free_internal_heap: 45487 bytes, delta: 0 bytes

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK you need to add a esp_heap_trace_free_hook() function to show the free trace. Here is my code, just for test.

static uint32_t *alloc_ptr = NULL;
void esp_heap_trace_alloc_hook(void* ptr, size_t size, uint32_t caps)
{
    if (enable_hook) {
        if ((caps & MALLOC_CAP_INTERNAL) != 0) {
            if ((size == 124) || (size == 128)) {
                ESP_EARLY_LOGW("alloc", "0x%08x allocated %d bytes", ptr, size);
                // ESP_LOGW("alloc", "0x%08x allocated %d bytes", ptr, size);
                alloc_ptr = ptr;
                esp_backtrace_print(15);
            }
        }
    }
}

void esp_heap_trace_free_hook(void* ptr)
{
    if (enable_hook) {
        if (alloc_ptr == ptr) {
            ESP_EARLY_LOGW("free", "0x%08x free", ptr);
            alloc_ptr = NULL;
            esp_backtrace_print(15);
        }
    }
}

@KonssnoK
Copy link
Contributor

KonssnoK commented May 23, 2024

@zhangyanjiaoesp i do have the free hook, it's just not called.
Or better, it is called after a very long time for all the previous allocations, which is strange.

the log above starts from 49+ kB of free memory, then the 128B start consuming the memory, until at 42kB of heap 6k are released


I (1987539) heap: free_internal_heap: 45999 bytes, delta: -128 bytes

I (3778539) heap: free_internal_heap: 42159 bytes, delta: -128 bytes

the thing is you have at least 3785539 - 1987539 = 180000 (ms?) between them

I (3785539) heap: free_internal_heap: 49071 bytes, delta: 6912 bytes
W (3786519) alloc: 0x3fca9d30 allocated 124 bytes

6912 = 128*54

@KonssnoK
Copy link
Contributor

@zhangyanjiaoesp could you confirm that you also see this delay in the resources release? What is it related to?
@cbaechler won't be available after the end of this week so we would like to move forward as much as we can.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK I didn't see so long delay on my side, it may related to the test environment, do you have many WiFi packets in your testing environment?And it may related to the busy state of the wifi task (When the data is sent out, hardware will post event to wifi task, wifi task also need to handle TX/RX event). You can capture packets to see if the NULL data has been sent out.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK which IDF commit do you use? we want to let mesh not send NULL data during scan, and I want to provide a wifi lib for you to test.

@cbaechler
Copy link
Contributor

cbaechler commented May 29, 2024

Current tests are based on commit f8319f0.

Today I tried to change the test environment, but I can still reproduce. I used the phone as a WIFI hotspot with only those two devkits connected to it. Also, we are able to reproduce the behavior in two geographically distinct locations (multiple 100 km apart), for this reason I think it is unrelated to the environment.

@cbaechler
Copy link
Contributor

I let the system run for ~2 hours to collect some data. The following plot shows the memory consumption over time:

image

We can observe the following:

  • Big junk of allocations happen every 5-8 minutes, which seems to be in-line with the scan interval long timer (see [ESP-WIFI-MESH] memory leak on IDF>=5.1.4 (IDFGH-12810) #13786 (comment))
  • Around t=1300 sec, the system manages to release some previously allocated memory
  • After this, it continues to allocate, but it does not release until t=7200 sec, where it releases all the memory at once
  • Finally, after this big release it did not start to drain again

For reference, I also attach the raw monitor output:
240529_GH_13786_log.txt

Also here, the setup is built with 2x ESP32-S3 DevKit-C connected to the phone hotspot (no other devices connected to the same access point).

@zhangyanjiaoesp
Copy link
Collaborator

@cbaechler @KonssnoK please test using the following wifi lib.

wifi_lib_esp32s3_5.2.zip
wifi firmware version: 770b98c

@KonssnoK
Copy link
Contributor

@cbaechler @KonssnoK please test using the following wifi lib.

wifi_lib_esp32s3_5.2.zip wifi firmware version: 770b98c

Hi @zhangyanjiaoesp , i think we'll be able to test tomorrow. We'll update asap.

@cbaechler
Copy link
Contributor

@zhangyanjiaoesp thanks for the wifi lib. I gave it a quick try and was not able to reproduce the issue up to now.

Actually, I do no longer see any of the 124 bytes allocations, thus I assume you are allocating this memory statically? Can you describe in more detail what the changes in the lib are?

Thank you!

@zhangyanjiaoesp
Copy link
Collaborator

@cbaechler We cancelled the sending of the null data.

@KonssnoK
Copy link
Contributor

@cbaechler We cancelled the sending of the null data.

so we can confirm that the temporary memory consumption issue is created by that data 🤔
how do we proceed?

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK I think the debug log in #13786 (comment) has confirm the 124 bytes allocation is the NULL data. And after modifying the wifi code, the 124 bytes allocation on longer be seen (#13786 (comment)). Is this evidence not sufficient enough?

@KonssnoK
Copy link
Contributor

@KonssnoK I think the debug log in #13786 (comment) has confirm the 124 bytes allocation is the NULL data. And after modifying the wifi code, the 124 bytes allocation on longer be seen (#13786 (comment)). Is this evidence not sufficient enough?

yes sure it's enough to know that is the allocation. My question was more about : what do we do next? :)
What is this NULL data used for? Is it necessary?
If yes, does it need to be reintroduced and changed so that the free is called in a timely manner 🤔 ?
If not, will it be removed from the main branch?

Thanks!

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK The NULL data is not necessary for mesh, and we will remove it from the main branch and other release branches.

@KonssnoK
Copy link
Contributor

KonssnoK commented Jun 3, 2024

@zhangyanjiaoesp ok thanks!

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Jun 4, 2024
@habablaxe
Copy link
Author

@KonssnoK The NULL data is not necessary for mesh, and we will remove it from the main branch and other release branches.

will this already be fixed in release 5.2.2?

@zhangyanjiaoesp
Copy link
Collaborator

@habablaxe The fix will be merged in release/v5.2, the stable version should be v5.2.3.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 6, 2024

@cbaechler We cancelled the sending of the null data.

@zhangyanjiaoesp
Since v5.1.2 was fine, there must be some reason to add sending the null data in newer esp-idf versions.
I'm curious what's the purpose of sending the null data?

@zhangyanjiaoesp
Copy link
Collaborator

@AxelLin The v5.1.2 also will send the null data, it's not a recent addition.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 6, 2024

@AxelLin The v5.1.2 also will send the null data, it's not a recent addition.

I don't get it.
Form you description, stop sending NULL data fixed the issue.
Then why it was working fine in v5.1.2?

@zhangyanjiaoesp
Copy link
Collaborator

@AxelLin @habablaxe
In my test, there is no difference in v5.1.2 and v5.1.4, and the alloc buffer can be freed in time. ( #13786 (comment)). You can provide a demo for me to test.

@AxelLin
Copy link
Contributor

AxelLin commented Jun 6, 2024

@AxelLin @habablaxe In my test, there is no difference in v5.1.2 and v5.1.4, and the alloc buffer can be freed in time. ( #13786 (comment)). You can provide a demo for me to test.

So you just confirmed the memory is allocated by sending NULL data but still don't know why the memory is not freed in time in #13786 (comment).
And your fix is stop sending NULL data ?

(It's probably ok if null data is not necessary for MESH. But I feel you didn't figure out the root cause for free memory in time.)

@habablaxe
Copy link
Author

@zhangyanjiaoesp

no difference in v5.1.2 and v5.1.4,

this was the reason why I opened this issue.
on versions <=5.1.2 everything is fine,
on versions >=5.1.3 there is a memory leak

@AxelLin
Copy link
Contributor

AxelLin commented Jun 6, 2024

@zhangyanjiaoesp

no difference in v5.1.2 and v5.1.4,

this was the reason why I opened this issue. on versions <=5.1.2 everything is fine, on versions >=5.1.3 there is a memory leak

i think the point is @zhangyanjiaoesp cannot reproduce this issue in his environment (for any versions).

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels Jun 6, 2024
@zhangyanjiaoesp
Copy link
Collaborator

@AxelLin @habablaxe In my test, there is no difference in v5.1.2 and v5.1.4, and the alloc buffer can be freed in time. ( #13786 (comment)). You can provide a demo for me to test.

So you just confirmed the memory is allocated by sending NULL data but still don't know why the memory is not freed in time in #13786 (comment). And your fix is stop sending NULL data ?

(It's probably ok if null data is not necessary for MESH. But I feel you didn't figure out the root cause for free memory in time.)

When the scan starts, if the device switches to another channel, it will send NULL data to the AP. But in this mesh case, the device only scan on the current channel, so NULL data is not necessary. So my fix is to stop sending NULL data. It can be seen as a workaround.
If you can provide a demo for me to reproduce the memory leak issue, I will continue to debugging on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

7 participants