Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing resource partition count via Helix Rest does not work reliably #2793

Open
wmorgan6796 opened this issue Apr 17, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@wmorgan6796
Copy link
Contributor

Describe the bug

When updating the resource configuration and ideal state via the rest API, specifically for the number of partitions for a resource, I find that it does not reliably create and assign the new partitions, requiring a full re-creation of the resource (with the same name as the outgoing resource). In addition, when doing that, I've found that when scaling down the number of partitions in a resource by recreating the resource, the new resource will correctly show that it has the scaled down number of partitions, but Helix will still attempt to assign the original larger number of partitions even though the resource was completely recreated.

Helix Configuration is attached for the cluster and resource

To Reproduce

  1. Create a helix cluster and create a resource within it. Create at least 3 instances within the cluster. Let the cluster assign everything out.
  2. Delete the resource
  3. Recreate the same resource with the same name, configuration, etc. except with less partitions (in our case we went from 2520 --> 2048)
  4. See that the participants are still trying to deal with moving non-existent partitions to the Master/Slave states

Expected behavior

When I edit the resource configuration it should automatically handle removing the partitions from participants and remove them entirely from the cluster. Also if I recreate a resource thats exactly the same as one I just deleted, just with a smaller number of partitions, the cluster should correctly assign the right number of partitions, not the older, incorrect number

Additional context

Configuration:
Helix-Config.txt

@wmorgan6796 wmorgan6796 added the bug Something isn't working label Apr 17, 2024
@junkaixue
Copy link
Contributor

@wmorgan6796 did you use the API create/delete resource?

For partition placement, change ResourceConfig does not work. It should be something in IdealState. You should update Ideal with the right partition number.

@wmorgan6796
Copy link
Contributor Author

I changed both the ideal state and the resource config

@junkaixue
Copy link
Contributor

@wmorgan6796 Is this cluster in normal state?

Means: 1) has a live controller
2) resource not disabled
3) cluster is not in maintenance mode
4) for WAGED rebalanced, not enough capacity for the cluster.
...

There are multiple cases can lead the partition not change. This requires some understand and debug with your controller log.

@junkaixue
Copy link
Contributor

Any update for this? @wmorgan6796

@wmorgan6796
Copy link
Contributor Author

Sorry I’ve been on leave for a bit and haven’t had time to come back to this.

but to answer the question:

Cluster was working
Cluster was not disabled, though we had disabled it before and after making the change.
Maintenance mode was not on in the cluster
There was plenty of capacity in the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants