Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large stacks in ZeroMQ threads lead to inflated crashpad minidumps on Windows #4682

Open
zokrovi opened this issue May 3, 2024 · 0 comments

Comments

@zokrovi
Copy link

zokrovi commented May 3, 2024

Issue description

On Windows only, after crashing in an application that uses ZeroMQ 4.3.5 and crashpad, the resulting minidump file may contain several threads with excessively large stack frames (num_threads x 4MB). This leads to inflated minidump sizes which may in some circumstances prevent users from uploading their large crash reports to rate limited endpoints such as Sentry with clientside sentry-native.

The reason this occurs is because of ZeroMQ's use of _beginthreadex which does not use STACK_SIZE_PARAM_IS_A_RESERVATION.
This causes the stack size reported to crashpad (and any other crash handler for that matter relying on VirtualQuery ) to be the full size and not representative of the used size.

Environment

  • libzmq version: 4.3.5
  • OS: Windows 10 / 11 64 Bit

Steps to reproduce the issue

  1. Run any application that uses many threads in ZeroMQ and also has crashpad as its crash handler.
  2. Crash the application
  3. Inspect the minidump and look for large stack frames 4MB in size.

Possible fix

My suggested fix is to pass the STACK_SIZE_PARAM_IS_A_RESERVATION flag to _beginthreadex (replacing the 0 here in the 5th argument). This will mean the requested 4MB of pages for the stack will not be committed by the OS immediately, but reserved as virtual memory.

https://github.com/zeromq/libzmq/blob/aa885c5a154256612108636b0fb22f44ae0e247a/src/thread.cpp#L54C1-L55C60

Here's an example of what this looks like in another project: Intel's TBB:

https://github.com/oneapi-src/oneTBB/blob/377e6c3b1719f8cc7b68f9d0939e652d7e3bf776/src/tbb/co_context.h#L210

In practice, this should not make much difference to performance anyway, as (I believe) even without this flag Windows does not commit to physical memory immediately until those pages are in use. This behaviour is very similar to how stack frames in pthreads work on Linux. However, in a near OOM scenario this will impact performance as those threads will not have a guarantee of physical pages from the OS. Also another downside from a robustness point of view: unlike UNIX system where you'd see a bus error, on Windows OOM may not lead to an exception if using STACK_SIZE_PARAM_IS_A_RESERVATION.

Therefore, contributors should consider whether ZeroMQ really needs the pages committed upfront (as is the case today) or if it's worth adding the STACK_SIZE_PARAM_IS_A_RESERVATION flag to help consumers that have crash reporting, and stack memory capture features etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant