Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve] Change default value for CompactionMaxTimeMillis. #4354

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thetumbled
Copy link
Contributor

@thetumbled thetumbled commented May 9, 2024

Motivation

The default value of minorCompactionMaxTimeMillis is -1 to let the compaction in no limit time, which will significantly reduces the ability of the compaction mechanism.

Quantify compaction capability

Firstly, we need to conduct a simple modeling of the compaction to obtain a series of quantitative relationships. Obviously, we have:

Dcompact = Vcompact * Tcompact

Dcompact represents the amount of compacted data, i.e. the size of reclaimed storage space, Vcompact represents the compaction speed, and Tcompact represents the compaction time.

As the entry log file contains both valid data and dirty data, we can define the notion -- valid data ratio for entry log file.

valid data ratio = valid data size / total data size

And we need to introduce another variable Dscan, before that, we need to be clear about the compaction procedure.
When a compactor try to compact a entry log file, it will read all of the valid data from this entry log file, and copy them to a new entry log file, and finally delete the old entry log file. So the compactor need to consume the I/O ability of valid data size to compact a entry log file.
Therefore, the size of valid data to read by compactor depends on the valid data ratio of this entry log file. Let the size of valid data to read by compactor to Dscan, we have:

entry log size = Dscan / p
Dcompact = entry log size * (1-p) = Dscan / p * (1-p)
  • Dscan is the cost when compactor compact an entry log file.
  • Dcompact is the gain when compactor compact an entry log file.
    We need to point out that Dcompact is strongly related to the valid data ratio of the entry log file it compact. See as the following graph:
    image

why infinite compaction time will reduces the ability of the compaction mechanism?

When the compactor start a compaction routine, it will aggregate the valid data ratio of all entry log file belonging to this bookie(get a map), and try to compact the entry log file with the smallest valid data ratio, and then compact the next smallest one...
After aggerate the valid data ratio map for all entry log file, this map will not update dynamically, but create a new one in the next compaction routine. As time goes by, the valid data ratio of entry log file will gradually decrease, some entry log file that is new created even not exit in the map, so if we unlimit the compaction time, the compactor will very likely try to compact an entry log file of 90% valid data raito, but ignore those entry log files of 10%, 20%, and so on.
Compacting the entry log file of 90% valid data raito is a very hard job, and gains almost nothing.

This is the reason why we can't let the compaction time to be so large that it try to do the hard work, but ignore the easy one. But we can't set it too small too, as the aggregation routine take times too.
One hour is a good value.

Changes

Change the default value of minorCompactionMaxTimeMillis and majorCompactionMaxTimeMillis to 1h.

Copy link
Member

@shoothzj shoothzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not to modify the default value now, these variable were introduced in #2670 . I think we can discuss change it in the next major version. cc @eolivelli @hangc0276

@shoothzj
Copy link
Member

shoothzj commented May 9, 2024

I think we should add it to the history docs as well.

@shoothzj
Copy link
Member

shoothzj commented May 9, 2024

I think we should add it to the history docs as well.

Sorry, that's comment for #4355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants