[RESEARCH] High memory consumption after frang configuration #2098

ykargin · 2024-04-08T13:05:28Z

Motivation

After frang configuration in PR 598 tests started to fail with

 [ 6570.228871] ksoftirqd/0: page allocation failure: order:9, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
 [ 6570.229960] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE     5.10.35.tfw-04d37a1 #1
 [ 6570.230476] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
 [ 6570.231459] Call Trace:
 [ 6570.231964]  dump_stack+0x74/0x92
 [ 6570.232458]  warn_alloc.cold+0x7b/0xdf

Research needed

Test to reproduce - tls/test_tls_integrity.ManyClients and tls/test_tls_integrity.ManyClientsH2 with -T 1 option and body > 16KB (8GB memory)

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2024-04-08T13:38:28Z

Looks like not enough memory.

RomanBelozerov · 2024-04-08T15:05:15Z

I receive a lof of Warning: cannot alloc memory for TLS encryption. and traceback

[26347.797820] CPU: 6 PID: 50 Comm: ksoftirqd/6 Kdump: loaded Tainted: P        W  OE     5.10.35.tfw-04d37a1 #1
[26347.797821] Hardware name: Micro-Star International Co., Ltd. GF63 Thin 11UC/MS-16R6, BIOS E16R6IMS.10D 06/23/2022
[26347.797822] Call Trace:
[26347.797829]  dump_stack+0x74/0x92
[26347.797831]  warn_alloc.cold+0x7b/0xdf
[26347.797834]  __alloc_pages_slowpath.constprop.0+0xd2e/0xd60
[26347.797835]  ? prep_new_page+0xcd/0x120
[26347.797837]  __alloc_pages_nodemask+0x2cf/0x330
[26347.797839]  alloc_pages_current+0x87/0xe0
[26347.797841]  kmalloc_order+0x2c/0x100
[26347.797842]  kmalloc_order_trace+0x1d/0x80
[26347.797843]  __kmalloc+0x3e9/0x470
[26347.797857]  tfw_tls_encrypt+0x7a2/0x820 [tempesta_fw]
[26347.797860]  ? memcpy_fast+0xe/0x10 [tempesta_lib]
[26347.797867]  ? tfw_strcpy+0x1ae/0x2b0 [tempesta_fw]
[26347.797870]  ? irq_exit_rcu+0x42/0xb0
[26347.797872]  ? sysvec_apic_timer_interrupt+0x48/0x90
[26347.797873]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[26347.797880]  ? tfw_h2_make_frames+0x1da/0x370 [tempesta_fw]
[26347.797886]  ? tfw_h2_make_data_frames+0x19/0x20 [tempesta_fw]
[26347.797892]  ? tfw_sk_prepare_xmit+0x69c/0x7b0 [tempesta_fw]
[26347.797898]  tfw_sk_write_xmit+0x6a/0xc0 [tempesta_fw]
[26347.797900]  tcp_tfw_sk_write_xmit+0x36/0x80
[26347.797902]  tcp_write_xmit+0x2a9/0x1210
[26347.797903]  __tcp_push_pending_frames+0x37/0x100
[26347.797904]  tcp_push+0xfc/0x100
[26347.797910]  ss_tx_action+0x492/0x670 [tempesta_fw]
[26347.797912]  net_tx_action+0x9c/0x250
[26347.797914]  __do_softirq+0xd9/0x291
[26347.797915]  run_ksoftirqd+0x2b/0x40
[26347.797916]  smpboot_thread_fn+0xd0/0x170
[26347.797918]  kthread+0x114/0x150
[26347.797918]  ? sort_range+0x30/0x30
[26347.797919]  ? kthread_park+0x90/0x90
[26347.797921]  ret_from_fork+0x1f/0x30
[26347.797923] Mem-Info:
[26347.797925] active_anon:132045 inactive_anon:1833119 isolated_anon:0
                active_file:492217 inactive_file:119308 isolated_file:0
                unevictable:199 dirty:23 writeback:0
                slab_reclaimable:45118 slab_unreclaimable:41418
                mapped:244887 shmem:205996 pagetables:15978 bounce:0
                free:758589 free_pcp:3043 free_cma:0

RomanBelozerov · 2024-05-07T08:57:07Z

I receive meamleak for these tests and tempesta commit - 10b38e0. I used remote setup (Tempesta on a separate VM) and cmd ./run_tests.py -T 1 tls/test_tls_integrity.ManyClientsH2 with MTU 80. So I run this test with 16KB, 64KB and 200KB body and I see the usage of all available memory (6GB for my VM for Tempesta) and meamleak after test ~1GB

look like it is fixed in #2105. I cannot get meamleak for this PR, but I see the usage of all available memory. I think Tempesta uses an unexpected lot of memory in these tests. 10 clients with 64KB response/request body, python uses ~400MB, but Tempesta ~5GB, why?

biathlon3 · 2024-05-24T07:00:48Z

Here we have the next situation for 64KB test.

In this test, Tempesta FW receives 65536 bytes of data request from 10 clients, routes them to a server, gets responds from a server and sends them to the clients.
With option -T 1, each request and respond are split by byte.
The key point is that if Tempesta FW receives only one byte, it uses a full skb (about 900 bytes).

Tempesta FW receives at least 655360 skbs from clients, it is 655360 * 900 = 589 824 000 bytes.
Tempesta FW makes copies of all skbs in ss_skb_unroll() because all skbs are marked as cloned. Since the original skbs are marked as SKB_FCLONE_CLONE, they are not freed after consume_skb() right at this point.

Next, before routing these skbs to the server Tempesta FW makes clones in ss_send() with the purpose of resending if something goes wrong.

After the server has responded, Tempesta FW receives the same amount of skbs as from the clients.
And as all skbs are marked as cloned, it makes copies of these skbs.

Here we have allocated at least 589 824 000 * 5 = 2 949 120 000 bytes and only after Tempesta FW starts sending responds to clients, it starts freeing skbs.

krizhanovsky · 2024-05-24T11:27:21Z

@biathlon3 thank you for the detailed analysis! I still have a couple of questions and appreciate your elaboration on them:

skb_cloned() in ss_skb_unroll() comes under unlikely() and IIRC this is because modern HW NICs form skbs with data in pages only (unfortunately, I don't remember why clones appear otherwise). So please research why clones appear in the network stack? Whether moving to a different virtual adapters (e.g. virtio-net or SR-IOV) helps to avoid the clones? Please see https://tempesta-tech.com/knowledge-base/Hardware-virtualization-performance/ . Since virtual environments aren't rare, probably we need to remove unlikely add comments to the code why clones appear and rework our wiki in recommendations for virtual environments.
What sk_buff spends 900 bytes for? Could you please write down how much memory which parts of SKB spend and which the Linux kernel compilation options may reduce the memory footprint. This probably can be documented in our wiki.

biathlon3 · 2024-05-27T11:49:18Z

What sk_buff spends 900 bytes for? Could you please write down how much memory which parts of SKB spend

Empty skb immidiatelly after ss_skb_alloc(0) or received skb:
sizeof(struct sk_buff) = 232
hdr_len = 320
sizeof(struct skb_shared_info) = 320

232 + 320 + 320 = 872
Actually little bit more, the smalest truesize=896

biathlon3 · 2024-05-28T13:50:56Z

So please research why clones appear in the network stack? Whether moving to a different virtual adapters (e.g. virtio-net or SR-IOV) helps to avoid the clones?

Skbs are marked as cloned when the test is started on the same virtual machine as Tempesta FW and it is not related to the type of virtual adapter.

If the test works on a separate VM, Tempesta FW receives uncloned skbs, with data is collected in pages, and within parsing process Tempesta FW calls ss_skb_split() for each portion of data. Anyway this variant is not as memory-demanding as the first.
But in tls.test_tls_integrity.ManyClientsH2 Tempesta FW additionally has to translate requests to HTTP/1 and responses back to HTTP/2 and this also costs extra memory.

krizhanovsky added the question Questions and support tasks label Apr 8, 2024

krizhanovsky added this to the 0.8 - Beta milestone Apr 8, 2024

krizhanovsky added the bug label Apr 8, 2024

krizhanovsky assigned biathlon3 Apr 11, 2024

const-t mentioned this issue May 7, 2024

Free postponed SKBs on connection close #2105

Merged

biathlon3 linked a pull request May 29, 2024 that will close this issue

Explanaition of high memory consumption. #2127

Merged

krizhanovsky closed this as completed in #2127 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RESEARCH] High memory consumption after frang configuration #2098

[RESEARCH] High memory consumption after frang configuration #2098

ykargin commented Apr 8, 2024 •

edited by RomanBelozerov

krizhanovsky commented Apr 8, 2024

RomanBelozerov commented Apr 8, 2024

RomanBelozerov commented May 7, 2024

biathlon3 commented May 24, 2024

krizhanovsky commented May 24, 2024

biathlon3 commented May 27, 2024

biathlon3 commented May 28, 2024

[RESEARCH] High memory consumption after frang configuration #2098

[RESEARCH] High memory consumption after frang configuration #2098

Comments

ykargin commented Apr 8, 2024 • edited by RomanBelozerov

Motivation

krizhanovsky commented Apr 8, 2024

RomanBelozerov commented Apr 8, 2024

RomanBelozerov commented May 7, 2024

biathlon3 commented May 24, 2024

krizhanovsky commented May 24, 2024

biathlon3 commented May 27, 2024

biathlon3 commented May 28, 2024

ykargin commented Apr 8, 2024 •

edited by RomanBelozerov