Ensure Top State Evenness When Capacity Keys Not Defined #2760
+34
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issues
When no resource or instance capacity keys are defined for a WAGED cluster, then WAGED does not try to ensure topstate evenness. In a cluster where the # participants = # replicas, each instance will have the same assignment.
If each resource has multiple partitions, then a degree of evenness will be achieved, but it could be improved.
If there is only 1 partition per resource and the # participants = # replicas, then one instance will be assigned leader for all resources as the tiebreak is the participant's logical ID.
Description
Customers should define resource weights and instance capacities to fully utilize the benefits of WAGED. However, for those that do not we should still consider top state evenness in WAGED assignment calculations. This PR adds changes to the "TopStateMaxCapacityUsageInstanceConstraint" soft constraint. Currently it preferentially assigns top state replicas to nodes with less top states based off the replica's weight. If no weight is defined for the resource and the node, then the score will always be 0 and tiebreak will be determined by the node's logicalID.
This change will make the soft constraint calculate the score based off the number of top states if there are no capacity keys defined for both the resource and the instance.
This only includes a very minor change of changing one test method's access modifier from private to public. I checked the CI logs and testNG has been successfully running this method, but convention is to set it to public.
Tests
TestTopStateMaxCapacityUsageInstanceConstraint
Manually tested distribution before and after change:
No Capacity keys defined for either the resources or the participants
Test 1
3 Participants, 20 WAGED Resources, 1 Partition Each.
Test 2
3 Participants, 20 WAGED Resources, 10 Partitios Each.
Ran PR CI against my personal fork, failed due to flaky test testEvacuationWithOfflineInstancesInCluster #2721
https://github.com/GrantPSpencer/helix/actions/runs/7935782011/job/21669654755?pr=6
Changes that Break Backward Compatibility (Optional)
N/A
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)