Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large OS pages support #485

Open
ghost opened this issue Mar 20, 2022 · 4 comments
Open

Large OS pages support #485

ghost opened this issue Mar 20, 2022 · 4 comments

Comments

@ghost
Copy link

ghost commented Mar 20, 2022

Hi,
I'm trying to make snmalloc use large OS pages in my application. My use case has been mentioned in #484 (comment). My application is targeting Windows. It is memory bottlenecked and I want to reduce page faults. On Linux, 4KB pages can be merged into 2MB large pages implicitly and automatically, but on Windows large page allocation is explicit.

I have tried to modify the PAL to use large pages. The key modifications are:

static SNMALLOC_CONSTINIT_STATIC size_t minimum_alloc_size = 0x200000;    // 2MB
static constexpr size_t page_size = 0x200000;                             // 2MB

// in `notify_using`
void* r = VirtualAlloc(p, size, MEM_COMMIT | MEM_LARGE_PAGES, PAGE_READWRITE);

This PAL works well with snmalloc1, except the need of some small patches (alignas(OS_PAGE_SIZE) not possible, and need to check rsize > OS_PAGE_SIZE before rsize - OS_PAGE_SIZE). I don't know there are flaws or not but it can just run.

snmalloc2 cannot work with this PAL. It crashes at:

SNMALLOC_ASSERT(!meta->is_unused());

I'm wondering whether I can use PAL to enable large os page support and expect snmalloc1 to work perfectly, and why this don't work with snmalloc2.


Sorry, may be duplicate with #222

@mjp41
Copy link
Member

mjp41 commented Mar 20, 2022

Would you be able to post the precise commit you used on snmalloc2? I would have expected snmalloc2 to work better with Large Pages than snmalloc1. A lot of the redesign was specifically for that, and getting this working is on my list.

@ghost
Copy link
Author

ghost commented Mar 21, 2022

Here is the PAL I used with snmalloc2

diff --git a/src/pal/pal_windows.h b/src/pal/pal_windows.h
index 6d64a25..98f9759 100644
--- a/src/pal/pal_windows.h
+++ b/src/pal/pal_windows.h
@@ -15,13 +15,6 @@
 #  include <windows.h>
 #  pragma comment(lib, "bcrypt.lib")
 #  include <bcrypt.h>
-// VirtualAlloc2 is exposed in RS5 headers.
-#  ifdef NTDDI_WIN10_RS5
-#    if (NTDDI_VERSION >= NTDDI_WIN10_RS5) && \
-      (WINVER >= _WIN32_WINNT_WIN10) && !defined(USE_SYSTEMATIC_TESTING)
-#      define PLATFORM_HAS_VIRTUALALLOC2
-#    endif
-#  endif
 
 #  include <chrono>
 
@@ -55,16 +48,11 @@ namespace snmalloc
      * Bitmap of PalFeatures flags indicating the optional features that this
      * PAL supports.  This PAL supports low-memory notifications.
      */
-    static constexpr uint64_t pal_features = LowMemoryNotification | Entropy |
-      Time
-#  if defined(PLATFORM_HAS_VIRTUALALLOC2) && !defined(USE_SYSTEMATIC_TESTING)
-      | AlignedAllocation
-#  endif
-      ;
+    static constexpr uint64_t pal_features = LowMemoryNotification | Entropy | Time | AlignedAllocation;
 
-    static SNMALLOC_CONSTINIT_STATIC size_t minimum_alloc_size = 0x10000;
+    static SNMALLOC_CONSTINIT_STATIC size_t minimum_alloc_size = 0x200000;
 
-    static constexpr size_t page_size = 0x1000;
+    static constexpr size_t page_size = 0x200000;
 
     /**
      * Windows always inherits its underlying architecture's full address range.
@@ -144,10 +132,26 @@ namespace snmalloc
     template<ZeroMem zero_mem>
     static void notify_using(void* p, size_t size) noexcept
     {
+      static struct PrivilegeBegger {
+        PrivilegeBegger() {
+          HANDLE hToken;
+          TOKEN_PRIVILEGES tp;
+          BOOL status;
+          OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &hToken);
+          LookupPrivilegeValueW(nullptr, L"SeLockMemoryPrivilege", &tp.Privileges[0].Luid);
+          tp.PrivilegeCount = 1;
+          tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
+          status = AdjustTokenPrivileges(hToken, FALSE, &tp, 0, (PTOKEN_PRIVILEGES)nullptr, 0);
+          if (!status)
+            error("Failed to gain privilege");
+          CloseHandle(hToken);
+        }
+      } begger [[maybe_unused]];
+
       SNMALLOC_ASSERT(
         is_aligned_block<page_size>(p, size) || (zero_mem == NoZero));
 
-      void* r = VirtualAlloc(p, size, MEM_COMMIT, PAGE_READWRITE);
+      void* r = VirtualAlloc(p, size, MEM_COMMIT | MEM_LARGE_PAGES, PAGE_READWRITE);
 
       if (r == nullptr)
         report_fatal_error(
@@ -158,31 +162,21 @@ namespace snmalloc
     template<bool page_aligned = false>
     static void zero(void* p, size_t size) noexcept
     {
-      if (page_aligned || is_aligned_block<page_size>(p, size))
-      {
-        SNMALLOC_ASSERT(is_aligned_block<page_size>(p, size));
-        notify_not_using(p, size);
-        notify_using<YesZero>(p, size);
-      }
-      else
-        ::memset(p, 0, size);
+      ::memset(p, 0, size);
     }
 
-#  ifdef PLATFORM_HAS_VIRTUALALLOC2
     template<bool state_using>
-    static void* reserve_aligned(size_t size) noexcept
+    static void* reserve_aligned(size_t size, size_t alignment = 0) noexcept
     {
       SNMALLOC_ASSERT(bits::is_pow2(size));
       SNMALLOC_ASSERT(size >= minimum_alloc_size);
 
+      if (alignment == 0) alignment = size;
+
       DWORD flags = MEM_RESERVE;
 
-      if (state_using)
-        flags |= MEM_COMMIT;
+      static_assert(!state_using, "not implemented");
 
-      // If we're on Windows 10 or newer, we can use the VirtualAlloc2
-      // function.  The FromApp variant is useable by UWP applications and
-      // cannot allocate executable memory.
       MEM_ADDRESS_REQUIREMENTS addressReqs = {NULL, NULL, size};
 
       MEM_EXTENDED_PARAMETER param = {
@@ -191,20 +185,16 @@ namespace snmalloc
       // initialisation list.
       param.Pointer = &addressReqs;
 
-      void* ret = VirtualAlloc2FromApp(
+      void* ret = VirtualAlloc2(
         nullptr, nullptr, size, flags, PAGE_READWRITE, &param, 1);
       if (ret == nullptr)
         errno = ENOMEM;
       return ret;
     }
-#  endif
 
     static void* reserve(size_t size) noexcept
     {
-      void* ret = VirtualAlloc(nullptr, size, MEM_RESERVE, PAGE_READWRITE);
-      if (ret == nullptr)
-        errno = ENOMEM;
-      return ret;
+      return reserve_aligned<false>(size, page_size);
     }
 
     /**

@mjp41
Copy link
Member

mjp41 commented Mar 21, 2022

Thanks, I have pushed a branch inspired by your code change: mjp41@13ddd50 .

This passes the test harness. What is the workload that was causing your crash?

The code needs proper fall back paths for failing to initialize large pages, and also for handling when there are no more large pages.

@ghost
Copy link
Author

ghost commented Mar 21, 2022

It is related to page_size. I have modified both minimum_alloc_size and page_size to 2MB and it crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant