-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time/memory regression in Frama-C with OCaml 5.0 #11733
Comments
Thanks @maroneze. I'll start to investigate some of these. Could you share the instructions for running the set of analysis which produced the table above? In the table, there are two sets of results with the same set of benchmarks. Could you tell us the difference between the two? |
Ok, here's a set of commands for OCaml 4.14:
(Note that OSCS has some written and unwritten dependencies, such as a system-specific After the analyses are run, in each target directory (all but frama-c: 2048, basic-cwe-examples, etc), there will be a Then I installed a 5.0.0+trunk switch, and had to manually compile Frama-C (since some of its opam dependencies are missing in OCaml 5). The following should work:
You should then be able to enter the open-source-case-studies directory and run If you want to run only a specific case study, e.g.
Each target is split in 2 calls to Frama-C: the first one performs only parsing ( I then simply grepped all printf "dir;target;Time (4.14);Memory (4.14);Time (5.0);Memory (5.0)\n"
cd oscs-ocaml-4.14
for d in 2048 basic-cwe-examples bench-moerman2018 cerberus chrony c-testsuite debie1 genann gzip124 hiredis icpc ioccc itc-benchmarks jsmn kgflags khash kilo libmodbus libspng libyaml microstrain mini-gmp miniz monocypher papabench polarssl qlz safestringlib semver solitaire tsvc tweetnacl-usable verisec x509-parser
do
cd $d
cd .frama-c
for t in *.{parse,eva}
do
cd $t
time=$(grep "^user_time" stats.txt | cut -d= -f2)
mem=$(grep "^memory" stats.txt | cut -d= -f2)
stats5=/tmp/compare/oscs-ocaml-5/$d/.frama-c/$t/stats.txt
time5=$(grep "^user_time" $stats5 | cut -d= -f2)
mem5=$(grep "^memory" $stats5 | cut -d= -f2)
target_no_suffix=${t%.eva}
target_no_suffix=${target_no_suffix%.parse}
printf "$d;$t;$time;$mem;$time5;$mem5\n"
cd ..
done
cd ../..
done For more details, only in case you need to tweak the commands executed by OSCS: each In any case, just tell me if you want me to prepare some source codes or command-lines to test Frama-C. |
Thanks for the instructions. I was able to build and run the
I ran On 4.14:
On 5.0
Does frama-c make use of ephemerons heavily? Ephemerons are hard to implement (or even to specify) in a multicore setting (see #10737). OCaml 5.0 avoids some of the complexity by not freeing data during the minor GC. For example, see https://github.com/ocaml/ocaml/blob/trunk/runtime/minor_gc.c#L414. I shall investigate more tomorrow. |
Out of curiosity, does this also apply to good old weak arrays/tables? |
Weak arrays/tables are well specified under parallelism. Ephemerons as they are defined now, where mutations are used to initialise and modify keys and data, is problematic. In OCaml 5.0, weak arrays are implemented as ephemerons with empty data fields as in OCaml 4.14. Going forward, the plan is to implement weak arrays separately from ephemerons with a simpler API. |
Frama-C Eva, which is the plugin being tested in Andre's benchmark, uses weak hash tables in an absolutely essential way. A very large part of the data structures being used are hash-consed. If weak hash sets have been reimplemented on top ephemerons for 5.00, this is probably the issue Note however that those sets are used in a purely monotone way: the only one modifying a key is the GC, to remove a dead one. And IIRC correctly keys are not reused. Perhaps this makes this use-case easier to support? |
To be clear, weak arrays have always been implemented on top of ephemerons in OCaml -- a weak array is just an ephemeron without the data field. 5.00 does not change this. I'm continuing to investigate this to confirm whether weak arrays are the cause of the regression. |
I have opened #11743 for improvements to weak arrays that significantly improves the performance on these benchmarks. |
What is the status of this issue? Were the performance issues fixed by #11743? |
Sorry, I didn't have much time to check it. Which specific version of the compiler should I try? |
Right now the answer is "neither", #12131 has not been merged at all. I will try to merge today so that it appears in trunk (but maybe not the opam switch, I don't know how up-to-date that is). It is not part of alpha1. |
Ok, thanks. I believe the numbers in #11662 should still be relevant, but if you believe there have been other changes that might impact it, I can try re-running the benchmarks. The conclusions from February were that there was still a time/memory regression, minimized when enabling our So, it seemed that overall, the performance issues were not entirely fixed by #11743, even if things definitely improved. From our side, keeping this issue open would still be relevant (we are still remaining on 4.14 for the time being), but if you prefer to close it, I can try re-running the benchmarks when OCaml 5.1 will be released, and create a new issue if necessary. Whatever best suits you in terms of managing issues is fine for me. |
#12131 is now merged in trunk (and the in-development 5.1 branch, so it should be part of the next beta release). This is a relatively small change that should improve the time performance of code using weak pointers (so, no change in deterministic mode) and made a 5-10% improvement on a weak-using benchmark (so, may not explain all the time difference between normal-4.14 and normal-5.x). |
Since the 5.1 alpha versions have been released, I would be curious to see the updated results. FWIW the beta version should be released sometime this week. |
Sorry for the delay. Here's some tests done with 5.2~beta2, compared to 4.14.2. The raw data is here: stats.ods The yellow lines (".eva" tasks) are the important ones, since they take the longest time and consume the most memory. |
(I would be interested in a geometric average of the memory ratio (new / old) and of the time ratio (new / old).) Skimming the results suggests that the obvious time regressions have been solved, 5.2 is in fact noticeably a few percent faster than 4.14 on average, but that the memory usage remains much higher, probably around 60% higher on average and up to 100% higher (double memory usage) in some cases. I have no idea what would explain the higher memory usage, I guess that one should look deeper at GC statistics to get an idea. This might be simply a difference of GC pacing (it spends less time collecting memory), maybe it is possible to change the |
Just in case, there's some context in #11662. In particular, there's option I did not use option |
It would be useful to check whether GC compactions are triggered in 4.14 on the benchmarks using a lot of memory. In 5.2, GC compaction support has been added, but needs to be triggered manually with |
So, this means patching Frama-C to call |
There is an OCaml setting to have it print some Gc statistics at the end using a runtime variable, no need to change the code. |
Thanks! I just tried it with 4.14, and it does print |
Useful to know. Thank you. |
A colleague mentioned me this issue, and suggested I try In the worst case, instead of 100%, it's now 30%; on the other hand, the 5% speedup practically disappeared. |
Yes, feel free to play with the My optimistic feeling from your number is that we are looking at what is probably a "case closed": the numbers are now good enough to run Frama-C in production with OCaml 5, and we could consider closing the issue. (The 30% and 11% examples are outliers in your delta graph, maybe there is still something interesting happening there, in the sense of a regression that could be fixed in 5.x to improve all programs? Unfortunately it is much harder to profile for memory overheads than for time overheads, so I wouldn't know how to look to find a root cause if it exists.) |
I am not sure if I have followed developments closely enough, so apologies if this is just noise. My understanding is that 4.14 included a number of optimizations to prevent weak pointers and ephemerons from keeping values they refer to from being collected for 2-3 GC cycles after the point at which they would ideally be considered dead. If I understand correctly, some but not all of these have been reimplemented for 5.x (one concrete example of the sort of thing I have in mind is mentioned in this comment by @bobot), and also that some independent optimizations have been implemented for 5.x (e.g. @kayceesrk's #11743 and @NickBarnes's #12131). Is that correct? Would there be some value in tracking any remaining such missing optimizations? I worry that some value in exploring the design space and finding worthwhile optimizations that has already been done before might be lost, even though the bottom line benchmark numbers have currently caught up, but perhaps for not quite the same reasons. |
This is a follow-up to #11662.
Patch #11673 indeed fixes the most critical memory issue, but when running several tests in parallel, we are still having some issues with high memory usage. These are not critical for OCaml 5.0, but we hope they might be fixed for 5.1.
When running our set of analyses, we observed the following differences between OCaml 4.14 and OCaml 5.0.0+trunk (
ocaml --version
gives5.0.0+dev8-2022-10-12
; note that we were unable to test with5.1.0+trunk
since theppxlib
opam package is currently incompatible with it):Overall, the startup time for Frama-C is about 1.2s longer, which, for short benchmarks, can increase execution by 300% or 400%. This is mentioned in #11662. Even when only a subset of dynamic modules are loaded, there's still about 100%-200% increase in execution time. For large benchmarks such as SATE 6, which consist in >40k runs of Frama-C with a minimal set of plug-ins, this can double or triple total execution time. It was not entirely clear to us if the regression in time complexity mentioned in #11662 accounts for this difference. Otherwise, we'd like to help investigate further if there are other possible causes.
The main issue I'd like to mention here, however, is memory consumption: as seen in the table above, large case studies still show a significant increase in memory usage, leading to some heavy swapping (before #11662, swapping happened almost instantly; now it takes several minutes, but still happens).
I tried to obtain a minimal reproducible example that was very robust, but I was unable to, since memory usage is not entirely deterministic, at least in OCaml 5. In OCaml <= 4.14, we obtain a consistent value for maximum memory usage across different runs, but in OCaml 5, sometimes simply changing whitespace in the source file, or re-running Frama-C, can result in as much as 3x memory usage.
The file attached below (which is a C file, but Github does not recognize it, so I renamed it to
.txt
) is currently the best minimal example I could come up with. It is based on debie1, which had a 934% increase in memory usage in the table.a-v5.txt
The command-line I used with it is the following:
In this "reduced" version, I get less than 200 MB of memory usage in OCaml <= 4.14, but > 1 GB in OCaml 5.
There is a "loop unroll" annotation in the code (line 2603) which can be modified to increase the memory usage difference between versions of the compiler. If we set it to 100, for instance, the OCaml 4 version uses only 100 MB, while the OCaml 5 version uses 150 MB. If I set it to 30000, the OCaml 4 version uses 392 MB while the OCaml 5 version uses 3.7 GB. Thus, this annotation could possibly help as a parameter during debugging.
My colleagues pointed out that there are possibly multiple issues; for instance, we know that analyses with
-eva
make heavy use of caches based on hashconsed values. These have been optimized over the years to perform as efficiently as possible, and it is likely that some of the heuristics used up to OCaml 4.14 may requires changes. There are some Frama-C options, such as-deterministic
, which influence the behavior of such values. Playing with them might provide some insight.In any case, please tell me if there are other tests that I could perform, or different setups to help understand the causes of the regression.
The text was updated successfully, but these errors were encountered: