-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption of Frama-C in OCaml 5 beta1 is about 10x larger #11662
Comments
Sorry, I meant to create this issue on https://github.com/ocaml/ocaml. Is there a way to migrate it, or should I just close it here and copy the contents there? |
I don't think there is, unfortunately. You have to copy it there. |
Apparently there is a way, since both are owned by the same organization, but it requires someone with write access to both repositories. |
Oh. Interesting. Ping @kit-ty-kate |
This looks like a memory leak in OCaml 5's frametable handling during native Dynlink (i.e. plugin loading), which happily looks easy to fix. The frame table stores the information about the stack and register contents needed by the GC, kept in a large hashtable (about 2MB here). The frametable is completely rebuilt every time a new module is loaded, which in When the frame table is rebuilt, the old ones are not immediately freed, as they may still be in use by GC happening on other domains. Instead, they are added to a list to be freed at the next GC cycle, when we can be certain they are no longer in use. Here's the bug: when the next GC cycle happens, only the first old frametable gets freed, even though the frametable may have been rebuilt many times in a single cycle. |
The memory leak has been fixed by #11673. There is still a regression in time complexity when loading plugins with many modules. However, this should however be less noticeable in user code and an hypothetical fix can wait for 5.1 . |
The quadratic-time behavior of plugin loading has been fixed by #11980, which is now merged in trunk. Updated performance results for Frama-C are available at #11980 (comment) . Some large slowdowns, in particular affecting short-running programs (the dynlink issue was only a large startup cost), are now gone, for example basic-cwe-examples was 75% slower in 5.0 than in 4.14.1, and is now 12% faster in trunk than in 4.14.1. There remain performance slowdowns of up to 20% for longer-running examples. Those would require new performance analysis to suggest the cause of the slowdown. I would be willing to hazard the guess that ephemerons / weak pointers may be responsible -- the 5.0 implementation is trickier and probably slower. But there may be other sources of slowdown that we haven't discovered yet. |
Thank you very much for these improvements! Concerning the slowdowns, as suggested by @bobot, I performed some tests using Frama-C's module Hashconsing_tbl =
(val if Cmdline.deterministic
then (module Hashconsing_tbl_not_weak: Hashconsing_tbl)
else (module Hashconsing_tbl_weak: Hashconsing_tbl)) These tables are notably used by the Eva plug-in, in a performance-critical part of the analysis. I tried several configurations: 4.14.1 and 5.0.1, with and without the Here's what I got as results (I only compared some long-running analyses, which seemed the most relevant for the remaining slowdown): Some results are repeated several times, just to put them next to each other for relative comparisons. If we look at the last line, we have the "current" version we are using (4.14.1, without We see that, overall, the analysis time is equivalent to our "status quo", or even faster. However, memory consumption is very high, more than triple in one case. If we look at the middle line, Finally, maybe the most interesting comparison is between However, the extra memory consumption is still a bit high in some cases, and can be problematic for some of our industrial partners who have larger code bases (and analyses that run much longer). So, for now, we still think remaining in 4.14.1 and not using (With the caveat that I may have mis-configured something when evaluating; if these numbers seem illogical, I can try to redo the analyses using the current trunk.) |
Thanks for the numbers, very interesting! I think that the numbers confirm our intuition that weak hash tables / ephemerons dominate the performance differences between 4.x and 5.x. This would be the single most important point of improvement to recover Frama-C performance in the non-deterministic variant. I don't know much about weak/ephemerons; my understanding was that #10737 proposed an API that could be implemented in a more efficient way later, but I don't know if this is still relevant and whether there is any work in this direction (@kayceesrk would know). Another very interesting comparison in your numbers is to compare "deterministic 4.14.1" with "deterministic 5.0.1". If I understand correctly, this gives an accurate picture of the performance impact of OCaml 5 when weak hashtables are out of the picture. (Completely? Or do you still use them in other parts of the code, just not for hash-consing which is their main usage?) From the point of view of "finding performance regressions in OCaml 5 that we can fix promptly", the lower-hanging fruits, if there are any, are probably there.
It would be interesting to profile these specific examples to understand where the regressions come from. For memory it is tricky as we don't have good tools for OCaml 5 memory profiling, but for time it may be that just comparing |
The work proposed in #10737 is still relevant. While the current API for ephemerons makes them immutable, they are still implemented over a mutable low-level representation that is also used by weak references. The goal would be to have separate low-level representations for weak arrays and ephemerons. Ephemerons will be implemented on top of an implementation of (I am hazy on the details but) I believe that the separation can help implement optimisations for Weak arrays and hash sets on OCaml 5, which wasn't possible because the low-level implementation was shared by ephemerons; we had to assume that everything is potentially an ephemeron at runtime, which forbids some optimisations. I should say that this is not currently being actively looked at by anyone at Tarides. |
My colleague just noticed that, when running Frama-C in a switch with OCaml.5.0.0~beta1, memory consumption of Frama-C is about 10x larger when loading dynamic modules (i.e., any of Frama-C's plug-ins).
I created a Dockerfile to test it on an isolated environment and obtained the same results.
When comparing the same code compiled under OCaml 4.14.0 and OCaml 5.0.0~beta1, I obtained the following differences:
The two first lines are the most important ones; the others are provided only as possible clues for further investigation.
Memory usage is obtained by running
/usr/bin/time -f "%M"
(it reportsmaximum resident set size
). Note that this is not just a theoretical issue (e.g. memory reserved but unused in practice): my colleague had his PC start swapping heavily when running a few dozen parallel instances of Frama-C, which doesn't happen with OCaml <= 4.14. This is when we decided to measure it.I compiled and installed Frama-C, then ran it under the following configurations:
frama-c -no-autoload-plugins
frama-c
frama-c -no-autoload-plugins -load-module <plugin>
We notice that, without any modules, memory usage is the same in both versions. But when all modules are loaded, it is almost 1GB in OCaml 5, versus less than 100 MB in OCaml 4.14.
Here's a hackish Dockerfile to help reproduce it:
In this Dockerfile, all we do is (1) install a new opam switch, (2) install Frama-C dependencies, then (3) clone and install Frama-C itself (currently, it is not possible to run
opam install frama-c
on a 5.0 switch due to the lack of a compatible Why3). Finally, (4) run Frama-C and measure memory usage.I simply replaced
5.0.0~beta1
with4.14.0
in the first RUN to compare it with OCaml 4.14.If you have any clues on what could be causing it and how to further investigate, please tell us. We'll try to take a look into it ourselves.
The text was updated successfully, but these errors were encountered: