New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hooks to debug OpenSSL memory #101626
base: main
Are you sure you want to change the base?
Conversation
It looks like the build is failing because we are trying to build agains OpenSSL 1.0 that is EOS since 2019.
I was tempted to simply disable the debug feature for that version as the very old OpenSSL has different prototype. Any thoughts @bartonjs on moving the build to at least 1.1.1 that is EOS only since last year and some distributions we support are still using it? There are probably different ways how to solve the build problems but I feel it is perhaps finally time to ditch 1.0. |
We build against Ubuntu 16.04 headers and libs currently #83428 . It is likely going to stay that way for .NET 9. |
What is reason for it @jkotas? It seems like even 8.0 does not support 16.04: https://github.com/dotnet/core/blob/main/release-notes/8.0/supported-os.md |
It is same deal as Windows 7. It is not supported, but we avoid intentionally breaking it to help some important customers. |
private static HashSet<IntPtr>? _allocations; | ||
private static HashSet<IntPtr>? _allocationsDiff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ConcurrentDictionery<IntPtr, byte>
to remove need for locking?
{ | ||
if (!_allocations!.Remove(entryPtr)) | ||
{ | ||
Environment.FailFast($"Failed to find OpenSSL memory 0x{(IntPtr)ptr:x}"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | |
if (!_allocations!.Remove(entryPtr)) | |
{ | |
Environment.FailFast($"Failed to find OpenSSL memory 0x{(IntPtr)ptr:x}"); | |
} | |
} | |
if (!_allocations!.Remove(entryPtr)) | |
{ | |
Environment.FailFast($"Failed to find OpenSSL memory 0x{(IntPtr)ptr:x}"); | |
} |
if (int.TryParse(value, CultureInfo.InvariantCulture, out int enabled)) | ||
{ | ||
if (enabled == 2) | ||
{ | ||
EnableTracking(); | ||
} | ||
else if (enabled == 3) | ||
{ | ||
GetIncrementalAllocations(); | ||
} | ||
|
||
return enabled == 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like these magic constants documented at the top of the file.
Also, the logic seems weird to me, the function returns true only if enabled == 1
, so using values 2 or 3 will not lead to setting custom memory alloc routines in OpenSSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really there just for trimming. I'll need to find better way how to do it ir put big comment there. Without it the code would be stripped and I spent some time bewating my head on why the reflection lookup was failing ;(
internal struct MemoryEntry | ||
{ | ||
if (EnsureOpenSslInitialized() != 0) | ||
public char* File; | ||
public int Size; | ||
public int Line; | ||
} | ||
|
||
private static readonly bool DebugMemory = GetMemoryDebug("DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_DEBUG"); | ||
private static readonly bool ValidateMemory = GetMemoryDebug("DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_VALIDATE"); | ||
private static readonly IntPtr Offset = sizeof(MemoryEntry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offset... of what?
After reading the code I understand how this works, but it may be too complicated for average dev. Would it make sense to move all memory-debug related functionality to separate file with a big comment on the top explaining how to use it and how it works (that we prefix allocations with a custom header to store allocation metadata)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private fields (except consts) use camelCase per repo coding convention
OpenSSL version support is not as simple as the OpenSSL support policy. Distros will continue to use EOL versions of OpenSSL but backport fixes under their own LTS support policy. I don't think .NET actually "officially" supports any Linux distros with 1.0.2 anymore. However 1.0.2k/g has played an important role far past its 2019 EOL and there are a number of Linux distros that still support it today. |
That said, since this is a diagnostic feature, I don't know that it makes sense to go through any particular lengths to get it working with 1.0. |
@@ -49,6 +49,6 @@ internal static void Initialize() | |||
} | |||
|
|||
[LibraryImport(Libraries.AndroidCryptoNative, EntryPoint = "CryptoNative_EnsureOpenSslInitialized")] | |||
private static partial int EnsureOpenSslInitialized(); | |||
private static partial int EnsureOpenSslInitialized(void* mallocFunction, void* reallocFunction, void* freeFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This p/invoke signature no longer matches Android's function. I don't think it will lead to stack red zone or shadow space problems, but might as well make sure the parameters match.
int32_t CryptoNative_EnsureOpenSslInitialized(void) |
malloc/free can be used in more places than it is ok to run managed code. For example, you can use malloc/free in thread detach callback, but running managed code in thread detach callback is not safe/reliable. What's our confidence level that OpenSSL only ever calls malloc/free in places where it is safe to run managed code? At minimum, this should get a full CI run with this instrumentation unconditionally enabled to see whether it is going to hit any crashes. |
|
||
private static readonly bool DebugMemory = GetMemoryDebug("DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_DEBUG"); | ||
private static readonly bool ValidateMemory = GetMemoryDebug("DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_VALIDATE"); | ||
private static readonly IntPtr Offset = sizeof(MemoryEntry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be 12 on 32-bit systems. It may mess up required alignment. Does OpenSSL have any alignment requirements on the allocator?
I can enable it for |
If this does not work, an alternative can be an unmanaged library that one preloads using LD_PRELOAD. There are number of memory leak / instrumentation libraries like that out there. https://github.com/WuBingzheng/libleak showed up at the top of my search results. It may be harder to deploy, but it can be used for any leak diagnostics, not specific to OpenSSL. |
I see OpenSSL installing thread-detach callbacks and freeing memory inside them here: https://github.com/openssl/openssl/blob/6a4a714045415be6720f4165c4d70a0ff229a26a/crypto/initthread.c#L358 . This won't be triggered synchronously from managed code. |
That is new in 3.0x and related to providers ... like FIPS. I can explore more. Is it not safe to use just the allocation functions or any managed code @jkotas? And I used generic tools before. They are difficult to use as it is difficult to focus only one particular part ... like SSL ... and tracking all runtime allocations is expensive. And my intention goes beyond just leaks. We have tight integration with OpenSSL and it would be good to havr more insight to what is happening inside. And the available hooks provide additional and useful information. I was originally thinking about make it available only in debug builds. That would certainly limit the scope of troubles. |
Any managed code. |
I think the logging is simple enough to be written in C and placed into the openssl PAL. That should make it easy to get rid of the concerns raised here. We can possibly make it even simpler, just writing the log entries into a log file and creating a simple tool to analyze it instead of processing the information in the runtime. |
internal static void EnableTracking() | ||
{ | ||
_allocationsDiff!.Clear(); | ||
_trackIncrementalAllocations = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw NRE if the required environment variables were not set beforehand. Perhaps a better exception would be useful?
Agreed. That said, I have a distilled version of the same concept (https://gist.github.com/filipnavara/7bf3791fb795266fe46f4383a075423c) deployed on a .NET 8 app (doesn't care about OpenSSL versioning since it's a one trick pony for diagnosing specific issue in a specific environment). It's likely possible to have that in separate library but it would come with additional complexities if the allocation callbacks needed to be written in C and compiled for multiple platforms. Having the C code in runtime takes care of this tricky part.
I'm not sure that would cut it. The app where we would use this diagnostic is operating on a scale where the allocations are multiple gigabytes at any given time and the allocation/free events are frequent, so summary information is what we need. Logging the calls [even in a compressed formats] would produce way too much information. |
Thank you for sharing this example. I like the flexibility of including it as source in the app. For example, you can monitor the total crypto memory consumption via performance counter using your existing monitoring solution as this example shows; you can add logging of stacktraces and only for certain allocation if you need that to diagnose them problem; etc. |
Since it is managed code the callbacks will have same safety problem, right @jkotas? I was thinking about similar solution @janvorli suggested - get the basic counters and hooks in c to make them safe. And find better way how to expose the details - either as event source, plain file writes as @janvorli suggested or whatever we agree to. |
one more note that the hooks are sensitive to the fact that it needs to be done before any allocations. That is not problem for the simple console app but it may be difficult for Kestrel where many things may happen IMHO before user code runs. |
Yes - if the app usage pattern results into OpenSSL calling malloc/free in places where it is not safe to run managed code.
It can be problem even for a console app - if some code in the app (e.g. 3rd party library) ends initializing OpenSSL before we get a chance to initialize it. |
Correct. We didn't encounter any issue in production with the managed callbacks. That said, we definitely use only subset of the OpenSSL functionality (a couple of crypto ciphers, Kestrel, SslStream, HttpClient).
We didn't have any problem injecting it early enough in the startup in the app that uses Kestrel.
Initial results from our experiment show that we don't see any OpenSSL related leaks. We still observe working set growing over time even if it's unaccounted for in the GC heap or other .NET heaps (eg. compiled code). Notably these graphs don't make the original issue entirely obvious. They just show that OpenSSL memory seems to stay at stable levels. Each instance has a HTTP endpoint for registration and then runs a large number of pollers that check for new email messages over variety of protocols (IMAP, Exchange Web Services, etc.) on numerous servers. We have a mechanism that rebalances all the polling registrations to a different instance so we should end up basically in the same "empty" state as on startup. When we do this and force a GC we see the managed heap going down but the working set never returns to the initial levels and it exceeds it by gigabyte(s). |
Sorry for off-topic
@filipnavara recently I have seen high memory usage due to lots of memory buffers being cached by malloc internally, see #101552 |
We had several cases when users complained about large memory use. For than native it is quite difficult to figure out where the memory goes. This PR aims to make that somewhat easier.
OpenSSL provides hooks for memory function so this PR adds switch to optimally hook into that.
The only one caveat that the
CRYPTO_set_mem_functions
works only if called before any allocations e.g. it needs to be done very early in the process. So I end up putting into initialization process .... even if I originally envisioned it somewhere else.The simple use pattern is something like
That provides insight how much memory is actually used by OpenSSL.
It allocates little bit more memory to store extra info but it should be reasonably cheap.
If somebody cares about more details they can do something like
this would provide something like
dumping large allocation data set is slow and expensive. It is done under local so it blocks all other OpenSSL allocations. I feel this is ok for now but it should be used with caution. I also feel that access through Reflection is OK since this is only last resort debug hook e.g. it does not need stable API and convenient access.