-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCX use a large amount of SYSV HugePage memory #9830
Comments
assuming UCX allocates huge pages with sysv transport you could try:
else if it is related to internal buffers there is also:
you could try removing md:sysv/thp/huge. also, |
I tried starting the program with the following configuration and found that SYSV huge pages are still being used. In the UCX log, I discovered that the huge pages are being used by
The I speculate that the TCP server in UCX is using this mpool. |
@yosefe, shall we allow non-huge pages allocation for |
I still have a question: why does using TCP transports require so many |
hi, @littleneko is it an option to upgrade the UCX version, or check if 7ca49c4 is part of the version in use? |
Thank you. |
Messages are not lost; It means that the receiver has not called ucp_tag_recv_nbx() when the message arrived |
Describe the bug
When we use UCX, we found that the system's Huge Pages are heavily utilized, with a total of over 4000 huge pages(2MB) occupied. In some scenarios, there is a large amount of memory being requested in a short period of time (around 1 minute), totaling around 2GB of huge page memory.
By examining the information in
/proc/[pid]/numa_maps
, we found that SYSV huge pages are heavily utilized. After analyzing our code, we found that only UCX uses SYSV to allocate large page memory, so we suspect that UCX is abnormally occupying these pages.What could be occupying this SYSV huge page memory? What scenarios might trigger this problem? Are there any methods to avoid it?
Our program using UCX uses the
ucp_tag_send_nb
anducp_tag_recv_nb
interfaces. Internally, there is a UCX server that receives external requests, with multiple clients establishing connections with it, which could be eitherrc_x
ortcp
. Additionally, there are several UCX clients establishing connections with other services, only usingrc_x
. A schematic diagram is shown below:The SYSV hugepage in
/proc/[pid]/numa_maps
:Steps to Reproduce
Setup and versions
CentOS Linux release 7.2.1511 (Core)
3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
ibstat
oribv_devinfo -vv
commandAdditional information (depending on the issue)
ucx_info -d
to show transports and devices recognized by UCXThe text was updated successfully, but these errors were encountered: