Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RESEARCH] High memory consumption after frang configuration #2098

Closed
ykargin opened this issue Apr 8, 2024 · 7 comments · Fixed by #2127
Closed

[RESEARCH] High memory consumption after frang configuration #2098

ykargin opened this issue Apr 8, 2024 · 7 comments · Fixed by #2127
Assignees
Labels
bug question Questions and support tasks
Milestone

Comments

@ykargin
Copy link
Contributor

ykargin commented Apr 8, 2024

Motivation

After frang configuration in PR 598 tests started to fail with

 [ 6570.228871] ksoftirqd/0: page allocation failure: order:9, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
 [ 6570.229960] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE     5.10.35.tfw-04d37a1 #1
 [ 6570.230476] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
 [ 6570.231459] Call Trace:
 [ 6570.231964]  dump_stack+0x74/0x92
 [ 6570.232458]  warn_alloc.cold+0x7b/0xdf

Research needed

Test to reproduce - tls/test_tls_integrity.ManyClients and tls/test_tls_integrity.ManyClientsH2 with -T 1 option and body > 16KB (8GB memory)

@krizhanovsky
Copy link
Contributor

Looks like not enough memory.

@krizhanovsky krizhanovsky added the question Questions and support tasks label Apr 8, 2024
@krizhanovsky krizhanovsky added this to the 0.8 - Beta milestone Apr 8, 2024
@RomanBelozerov
Copy link
Contributor

I receive a lof of Warning: cannot alloc memory for TLS encryption. and traceback

[26347.797820] CPU: 6 PID: 50 Comm: ksoftirqd/6 Kdump: loaded Tainted: P        W  OE     5.10.35.tfw-04d37a1 #1
[26347.797821] Hardware name: Micro-Star International Co., Ltd. GF63 Thin 11UC/MS-16R6, BIOS E16R6IMS.10D 06/23/2022
[26347.797822] Call Trace:
[26347.797829]  dump_stack+0x74/0x92
[26347.797831]  warn_alloc.cold+0x7b/0xdf
[26347.797834]  __alloc_pages_slowpath.constprop.0+0xd2e/0xd60
[26347.797835]  ? prep_new_page+0xcd/0x120
[26347.797837]  __alloc_pages_nodemask+0x2cf/0x330
[26347.797839]  alloc_pages_current+0x87/0xe0
[26347.797841]  kmalloc_order+0x2c/0x100
[26347.797842]  kmalloc_order_trace+0x1d/0x80
[26347.797843]  __kmalloc+0x3e9/0x470
[26347.797857]  tfw_tls_encrypt+0x7a2/0x820 [tempesta_fw]
[26347.797860]  ? memcpy_fast+0xe/0x10 [tempesta_lib]
[26347.797867]  ? tfw_strcpy+0x1ae/0x2b0 [tempesta_fw]
[26347.797870]  ? irq_exit_rcu+0x42/0xb0
[26347.797872]  ? sysvec_apic_timer_interrupt+0x48/0x90
[26347.797873]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[26347.797880]  ? tfw_h2_make_frames+0x1da/0x370 [tempesta_fw]
[26347.797886]  ? tfw_h2_make_data_frames+0x19/0x20 [tempesta_fw]
[26347.797892]  ? tfw_sk_prepare_xmit+0x69c/0x7b0 [tempesta_fw]
[26347.797898]  tfw_sk_write_xmit+0x6a/0xc0 [tempesta_fw]
[26347.797900]  tcp_tfw_sk_write_xmit+0x36/0x80
[26347.797902]  tcp_write_xmit+0x2a9/0x1210
[26347.797903]  __tcp_push_pending_frames+0x37/0x100
[26347.797904]  tcp_push+0xfc/0x100
[26347.797910]  ss_tx_action+0x492/0x670 [tempesta_fw]
[26347.797912]  net_tx_action+0x9c/0x250
[26347.797914]  __do_softirq+0xd9/0x291
[26347.797915]  run_ksoftirqd+0x2b/0x40
[26347.797916]  smpboot_thread_fn+0xd0/0x170
[26347.797918]  kthread+0x114/0x150
[26347.797918]  ? sort_range+0x30/0x30
[26347.797919]  ? kthread_park+0x90/0x90
[26347.797921]  ret_from_fork+0x1f/0x30
[26347.797923] Mem-Info:
[26347.797925] active_anon:132045 inactive_anon:1833119 isolated_anon:0
                active_file:492217 inactive_file:119308 isolated_file:0
                unevictable:199 dirty:23 writeback:0
                slab_reclaimable:45118 slab_unreclaimable:41418
                mapped:244887 shmem:205996 pagetables:15978 bounce:0
                free:758589 free_pcp:3043 free_cma:0

@RomanBelozerov
Copy link
Contributor

I receive meamleak for these tests and tempesta commit - 10b38e0. I used remote setup (Tempesta on a separate VM) and cmd ./run_tests.py -T 1 tls/test_tls_integrity.ManyClientsH2 with MTU 80. So I run this test with 16KB, 64KB and 200KB body and I see the usage of all available memory (6GB for my VM for Tempesta) and meamleak after test ~1GB

look like it is fixed in #2105. I cannot get meamleak for this PR, but I see the usage of all available memory. I think Tempesta uses an unexpected lot of memory in these tests. 10 clients with 64KB response/request body, python uses ~400MB, but Tempesta ~5GB, why?

@biathlon3
Copy link
Contributor

Here we have the next situation for 64KB test.

In this test, Tempesta FW receives 65536 bytes of data request from 10 clients, routes them to a server, gets responds from a server and sends them to the clients.
With option -T 1, each request and respond are split by byte.
The key point is that if Tempesta FW receives only one byte, it uses a full skb (about 900 bytes).

Tempesta FW receives at least 655360 skbs from clients, it is 655360 * 900 = 589 824 000 bytes.
Tempesta FW makes copies of all skbs in ss_skb_unroll() because all skbs are marked as cloned. Since the original skbs are marked as SKB_FCLONE_CLONE, they are not freed after consume_skb() right at this point.

Next, before routing these skbs to the server Tempesta FW makes clones in ss_send() with the purpose of resending if something goes wrong.

After the server has responded, Tempesta FW receives the same amount of skbs as from the clients.
And as all skbs are marked as cloned, it makes copies of these skbs.

Here we have allocated at least 589 824 000 * 5 = 2 949 120 000 bytes and only after Tempesta FW starts sending responds to clients, it starts freeing skbs.

@krizhanovsky
Copy link
Contributor

@biathlon3 thank you for the detailed analysis! I still have a couple of questions and appreciate your elaboration on them:

  1. skb_cloned() in ss_skb_unroll() comes under unlikely() and IIRC this is because modern HW NICs form skbs with data in pages only (unfortunately, I don't remember why clones appear otherwise). So please research why clones appear in the network stack? Whether moving to a different virtual adapters (e.g. virtio-net or SR-IOV) helps to avoid the clones? Please see https://tempesta-tech.com/knowledge-base/Hardware-virtualization-performance/ . Since virtual environments aren't rare, probably we need to remove unlikely add comments to the code why clones appear and rework our wiki in recommendations for virtual environments.
  2. What sk_buff spends 900 bytes for? Could you please write down how much memory which parts of SKB spend and which the Linux kernel compilation options may reduce the memory footprint. This probably can be documented in our wiki.

@biathlon3
Copy link
Contributor

What sk_buff spends 900 bytes for? Could you please write down how much memory which parts of SKB spend

Empty skb immidiatelly after ss_skb_alloc(0) or received skb:
sizeof(struct sk_buff) = 232
hdr_len = 320
sizeof(struct skb_shared_info) = 320

232 + 320 + 320 = 872
Actually little bit more, the smalest truesize=896

@biathlon3
Copy link
Contributor

So please research why clones appear in the network stack? Whether moving to a different virtual adapters (e.g. virtio-net or SR-IOV) helps to avoid the clones?

Skbs are marked as cloned when the test is started on the same virtual machine as Tempesta FW and it is not related to the type of virtual adapter.

If the test works on a separate VM, Tempesta FW receives uncloned skbs, with data is collected in pages, and within parsing process Tempesta FW calls ss_skb_split() for each portion of data. Anyway this variant is not as memory-demanding as the first.
But in tls.test_tls_integrity.ManyClientsH2 Tempesta FW additionally has to translate requests to HTTP/1 and responses back to HTTP/2 and this also costs extra memory.

@biathlon3 biathlon3 linked a pull request May 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug question Questions and support tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants