Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally allow many repair jobs per host #3790

Open
Tracked by #3791
Michal-Leszczynski opened this issue Apr 15, 2024 · 0 comments
Open
Tracked by #3791

Optionally allow many repair jobs per host #3790

Michal-Leszczynski opened this issue Apr 15, 2024 · 0 comments
Labels
enhancement New feature or request repair

Comments

@Michal-Leszczynski
Copy link
Collaborator

This is a dup of #3611, but since that issue has a long history of comments and was stale, I decided to create a new one with updated argumentation.

One of the more important changes in SM 3.2 repair was sticking to the one job per one host rule. I believe that for a bigger cluster this might kill any parallelism on the node level. Let's analyze a big cluster: 2 dcs 30 nodes each - all nodes have max_repair_ranges_in_parallel = 7. By default, each keyspace in such cluster would consist of 60 * 256 = 15360 token ranges. Assuming that they keyspace has replication {'dc1': 3, 'dc2': 3'}, we have (30!/(3! * 27!))^2 = 4060^2 = 16 483 600 possible replica sets. Assuming that token ranges are distributed uniformly across all possible replica sets, it is rather unlikely that a single repaired replica set owns more than 1 token range. This combined with the fact that SM sends repair jobs only for a single replica set results in SM sending only a single token range per repair job despite max_repair_ranges_in_parallel = 7.

This behavior could be controlled by an additional flag or repair config option in scylla-manager.yaml.
In terms of testing, it would be good to see performance improvement on a big cluster like: 2ds, 15 nodes each, keyspace with RF 3 in each dc, setup in which the repair indeed has to do some work (missing rows on some nodes). This bigger setup would definitely require a help from QA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request repair
Projects
None yet
Development

No branches or pull requests

1 participant