Changing resource partition count via Helix Rest does not work reliably #2793

wmorgan6796 · 2024-04-17T19:18:30Z

Describe the bug

When updating the resource configuration and ideal state via the rest API, specifically for the number of partitions for a resource, I find that it does not reliably create and assign the new partitions, requiring a full re-creation of the resource (with the same name as the outgoing resource). In addition, when doing that, I've found that when scaling down the number of partitions in a resource by recreating the resource, the new resource will correctly show that it has the scaled down number of partitions, but Helix will still attempt to assign the original larger number of partitions even though the resource was completely recreated.

Helix Configuration is attached for the cluster and resource

To Reproduce

Create a helix cluster and create a resource within it. Create at least 3 instances within the cluster. Let the cluster assign everything out.
Delete the resource
Recreate the same resource with the same name, configuration, etc. except with less partitions (in our case we went from 2520 --> 2048)
See that the participants are still trying to deal with moving non-existent partitions to the Master/Slave states

Expected behavior

When I edit the resource configuration it should automatically handle removing the partitions from participants and remove them entirely from the cluster. Also if I recreate a resource thats exactly the same as one I just deleted, just with a smaller number of partitions, the cluster should correctly assign the right number of partitions, not the older, incorrect number

Additional context

Configuration:
Helix-Config.txt

junkaixue · 2024-04-22T21:28:50Z

@wmorgan6796 did you use the API create/delete resource?

For partition placement, change ResourceConfig does not work. It should be something in IdealState. You should update Ideal with the right partition number.

wmorgan6796 · 2024-04-22T22:38:10Z

I changed both the ideal state and the resource config

junkaixue · 2024-05-09T16:57:20Z

@wmorgan6796 Is this cluster in normal state?

Means: 1) has a live controller
2) resource not disabled
3) cluster is not in maintenance mode
4) for WAGED rebalanced, not enough capacity for the cluster.
...

There are multiple cases can lead the partition not change. This requires some understand and debug with your controller log.

junkaixue · 2024-05-30T21:48:11Z

Any update for this? @wmorgan6796

wmorgan6796 · 2024-05-30T22:15:08Z

Sorry I’ve been on leave for a bit and haven’t had time to come back to this.

but to answer the question:

Cluster was working
Cluster was not disabled, though we had disabled it before and after making the change.
Maintenance mode was not on in the cluster
There was plenty of capacity in the cluster.

wmorgan6796 added the bug Something isn't working label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing resource partition count via Helix Rest does not work reliably #2793

Changing resource partition count via Helix Rest does not work reliably #2793

wmorgan6796 commented Apr 17, 2024

junkaixue commented Apr 22, 2024

wmorgan6796 commented Apr 22, 2024

junkaixue commented May 9, 2024

junkaixue commented May 30, 2024

wmorgan6796 commented May 30, 2024

Changing resource partition count via Helix Rest does not work reliably #2793

Changing resource partition count via Helix Rest does not work reliably #2793

Comments

wmorgan6796 commented Apr 17, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

junkaixue commented Apr 22, 2024

wmorgan6796 commented Apr 22, 2024

junkaixue commented May 9, 2024

junkaixue commented May 30, 2024

wmorgan6796 commented May 30, 2024