New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch the default allocation policy to best-fit. #10188
Conversation
I fully support that. We(=LexiFi) have also been using the new strategy happily in production for more than 6 months now. |
runtime/caml/config.h
Outdated
@@ -240,7 +240,7 @@ typedef uint64_t uintnat; | |||
/* Default speed setting for the major GC. The heap will grow until | |||
the dead objects and the free list represent this percentage of the | |||
total size of live objects. */ | |||
#define Percent_free_def 80 | |||
#define Percent_free_def 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a data point, in the systems I have tested, 120 has been notably faster without a measurably significant increase in memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To concur, Coq sets 200 with the best-fit, and when I asked they seemed happy with it, so 120 is not shocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At 200, memory use can increase significantly, but 120 seems like a good trade-off.
According to some measurements of the two (see https://blog.janestreet.com/memory-allocator-showdown/), best-fit at 100 has similar performance to next-fit at 80 while using less memory. This means that the gains from switching to best-fit are entirely allocated to reducing memory use, while at 120 the gains are split between reducing memory use and reducing time spent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Personally I think that the setting of 200 for Coq was a bit too aggressive. It is a very memory-hungry program, and while everyone is able to wait 10% longer to get their proof to complete, your machine may not necessarily be able to give 10% more memory to the process. I think this hard-limit behavior of memory would suggest a more conservative default, but I guess heavy Coq-users don't have the same perspective as they bought large-memory machines anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gasche: let's have default values that don't waste too much memory, because running out of memory and starting to swap is very costly indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to hear your story concerning the "100" default value for Percent_free_def, since opinions differ.
Regardless, this is a fine PR and I'm confident it's going to improve performance on many OCaml programs.
All of the Solo5 MirageOS unikernels we can find are basically faster and better using best fit, so I'm fully supportive of this change. Keeping the defaults at 100 for roughly the same memory utilisation as next-fit at 80 is helpful for us (as we don't have swap and compact on running out of the physmem freelist). |
@avsm This is interesting because Jane Street has reported getting about the same timing and less memory with the setting at 100. I'll change the default to 120 as it seems more popular as a general-purpose setting. Also, if memory utilisation is very important, I would suggest switching the heap increases back to a fixed amount (instead of the current 15% exponential increase). The performance problem that happens with next-fit and fixed increases doesn't exist with best-fit. In fact, what do people think of changing the default heap increase in this PR along with the other settings? |
Has the switch-back received any testing? If not yet, I would suggest asking the Coq people to switch to a fixed increase (can one do that using |
I'm not enthusiastic about this one. There are a small number of operations in the runtime that are In particular, there's a linear scan to find where to place the new chunk in sweep order when the heap expands. (This could be made sublinear with skiplists, but right now it's fine because the list is short) |
My reaction too. It's nice that the number of heap chunks grows slowly. Plus, the more parameters we change at once, the more feedback from the field is hard to understand. Why don't we make one big change (to the best-fit allocator), see what happens, then do further changes? |
@@ -108,7 +108,7 @@ type control = | |||
percentage of the memory used for live data. | |||
The GC will work more (use more CPU time and collect | |||
blocks more eagerly) if [space_overhead] is smaller. | |||
Default: 80. *) | |||
Default: 100. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be changed from 100 to 120? @damiendoligez?
harder, unless you relax it by increasing [space_overhead]. | ||
Note: If you change to next-fit, you may need to reduce | ||
the [space_overhead] setting, for example using [80] instead | ||
of the default [100] which is tuned for best-fit. Otherwise, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this
@Bannerets you're right. I've pushed a fix directly to trunk (5661bbf). |
Setting the default policy is not enough, we also have to set the initial value of the freelist function pointers...
Setting the default policy is not enough, we also have to set the initial value of the freelist function pointers...
* switch the default allocation policy to best-fit and adjust the overhead parameter * change the default overhead to 120
Setting the default policy is not enough, we also have to set the initial value of the freelist function pointers...
* switch the default allocation policy to best-fit and adjust the overhead parameter * change the default overhead to 120
Setting the default policy is not enough, we also have to set the initial value of the freelist function pointers...
Switching to best-fit avoids most compactions and makes the worst-case behavior much better than next-fit in terms of time and memory.
The code is now sufficiently well-tested (at Jane Street and by Coq, see coq/coq#13040) so we can make the change with confidence.