New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(magnum): enable upgrades with cluster_template_id changes #1598
Conversation
@mnaser has this been tested on your env? we dont have ci for magnum unfortunately |
hi @nikParasyr -- i'm working with a few people here who will be testing and reporting how it works for them. :) |
Hi @mnaser, we tested the change and got this error from the API: |
Oh! I think we have to bump the micro version for the Magnum to support this operation. https://docs.openstack.org/magnum/latest/contributor/api-microversion.html I think we have to bump it to a newer version which can support this operation. @schlakob do you think you might have sometime to look at this, maybe @nikParasyr rsn into a similar issue with this. |
Just to understand it correctly, you mean that we need to bump this version in our Openstack Infrastructure, am I right? |
Nope! Inside the Terraform provider! |
@schlakob can you try again with this new updated commit? |
Yes I will try it this afternoon. |
@mnaser - I ran a quick test with the recent changes and this are my observations:
To validate this I am currently recreating the environment and will test again. |
That's a good start!
OK, this may actually be a bug inside of the Cluster API driver for Magnum. I filed vexxhost/magnum-cluster-api#176
Can you confirm if the cluster template has changed in the API itself (aka is it a bug with the Terraform state not being updated, or is it the API cluster template in Magnum that was not updated?)
Excellent. |
So I run another test and observed about the same. Regarding the not changing template, it does not change in Magnum either. So in Nova it is still the old template name and id, but the cluster is in status "UPDATE_COMPLETE". One additional note: The cluster I tested with hast 1 CP and 2 worker. It updates 1 Worker, then 1 min later the CP and then about 5 min later the second worker. After 10 minutes in, all nodes are on the new version. So far so good, but another 3 min after that, the first worker was recreating again. |
Can you do me a favour and try doing the upgrade via the CLI and see if the template changes? If it doesn't, I think you might have found another Cluster API driver bug here. The extra rollout does seem to be incorrect, I wonder if that's health checks failing because of the single control plane. |
Another test with CLI:
|
maybe this is related? kubernetes-sigs/cluster-api#8628 (fix released in capi 1.5.0 ) In any case, these seem to be more cluster-api/magnum-cluster-api-driver related issues. @mnaser let me know when capi (magnum-capi) issues are fixed to review the tpo changes. Thanks |
This could be it! We are still running 1.4 I believe. We can roll that out to 1.5
Correct, but I'd like to get a good ability to validate that a successful upgrade happens.
Will do. We are almost there. |
I have tested this changes on Magnum versions Ussuri and Wallaby - works fine. |
@pawcykca : thank you so much for testing this, how do you suggest writing acceptance tests for this? |
@nikParasyr i've tested this locally a few times as well with others who have, are we good to kick this in as is or how do you suggest? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mnaser sorry for the delay but i was off for some personal stuff.
Since this has been tested i can approve it. (need to find time to really figure out the ci for magnum).
Thanks for the work
This should hopefully stop the cluster from being recreated and instead it will trigger an upgrade if the
cluster_template_id
is changed.