Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Kubernetes Service: 2020-06 updates #7233

Merged
merged 55 commits into from Jun 10, 2020
Merged

Conversation

tombuildsstuff
Copy link
Member

@tombuildsstuff tombuildsstuff commented Jun 5, 2020

There's a number of Pull Requests open for Azure Kubernetes Service - so that we can merge these without conflicting in every Pull Request - this PR consolidates these changes into a single branch.

To achieve this these branches have been pulled locally, squashed, had any fixes applied and then cherry-picked onto this "combined" branch - so the original authors credit should still apply.

At the same time this PR adds support for a number of Feature Requests - specific details are below:

Whilst pulling this many changes into a single Pull Request isn't ideal (and we'd generally avoid it) - this allows us to merge the changes above whilst avoiding merge conflicts in all the open PR's and so appeared to be the most pragmatic approach to move forward.


One thing in particular to call out is updating Node Pools - where a Kubernetes Cluster/Control Plane must be updated before the Node Pools can be updated; as such we've added a check during the Apply to ensure that the version of Kubernetes used for the Node Pool is supported by the Control Plane (or if we need to upgrade that first). Example of that error:

Error:
The Kubernetes/Orchestrator Version "1.16.9" is not available for Node Pool "default".

Please confirm that this version is supported by the Kubernetes Cluster "tom-dev-aks"
(Resource Group "tom-dev-rg") - which may need to be upgraded first.

The Kubernetes Cluster is running version "1.16.8".

The supported Orchestrator Versions for this Node Pool/supported by this Kubernetes Cluster are:
 * 1.14.7
 * 1.16.8
 * 1.15.10
 * 1.14.8
 * 1.15.11

Node Pools cannot use a version of Kubernetes that is not supported on the Control Plane. More
details can be found at https://aka.ms/version-skew-policy.

Fixes #1915
Fixes #4327
Fixes #5134
Fixes #5487
Fixes #5541
Fixes #5646
Fixes #6058
Fixes #6086
Fixes #6462
Fixes #6464
Fixes #6612
Fixes #6702
Fixes #6912
Fixes #6994
Fixes #7092
Fixes #7136
Fixes #7198

Copy link
Member

@mbfrahry mbfrahry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor additional checks

@tombuildsstuff tombuildsstuff self-assigned this Jun 5, 2020
@pbrit
Copy link

pbrit commented Jun 5, 2020

@tombuildsstuff Thank you for your work.

However, I'm concerned about your approach. Yes, it's an effective way to add more functionality, yet, it's equally effective for introducing more bugs.

Even after a short review session, I see many issues with the way SpotPriority was re-implemented. It's just one chunk, there are 10+ more which I'm not familiar with.

I'm going to take this branch for a ride and will provide the feedback.
Also, I'm wondering what the rest of community think of about it.

tombuildsstuff and others added 17 commits June 8, 2020 10:27
Porting over the changes from #5824

X-Committed-With: neil-yechenwei
… prior to deployment

This raises an error with more information about which Kubernetes Versions are supported
by the Kubernetes Cluster - and prompts a user to upgrade the Kubernetes Cluster first if
necessary.

This commit also adds acceptance tests to confirm the upgrade scenarios
This allows for setting `mode` to `System` (or `User`) for definining secondary System node pools.

Fixes #6058
@tombuildsstuff
Copy link
Member Author

@pbrit

Due to the nature of AKS unfortunately we're frequently faced with breaking changes (either behaviourally, or code changes) within AKS every few months. As such over time we've built up an extensive test suite for AKS which covers all kinds of configurations, functionality and bugs - which we're continually adding to as new functionality gets added/bugs get fixed.

We'd had a heads-up of some potentially breaking changes coming to the AKS API's in the near-future - as such we've been trying to find out more information about the specific details. In the interim, there's been a number of PR's opened (including yours) which have looked to add new functionality/fix bugs in this resource.

We've recently got more details about these changes and it appears that they won't break us in the way we'd been expecting - as such we're good to proceed with these PR's, however having a number of open PR's puts us in an unfortunate position where merging one causes merge conflicts in the other.

Having spent some time reviewing these PR's and determining how's best to proceed, we came to the conclusion that consolidating these (7 PR's) together was the most pragmatic way forward. Since a number of these PR's also required (and vendored) a newer API version - the best way to do this was to squash the commits down, remove any vendoring changes and then fix comments from PR review.

Whilst generally speaking I'd agree with you regarding a single large PR's vs multiple small PR's - in this instance we're fortunate to have sufficient test coverage of the AKS Resources that we're able to detect bugs both in our code and in the AKS API's themselves. To give one specific example here: the test suite had detected a breaking behavioural change in the AKS API when updating a Load Balancer in a given configuration (which I'd also spotted during development) - but will be pushing a fix for shortly.

Whilst we appreciate there's some downsides to this approach - and we'd have preferred to have gone through and merged those PR's individually - after digging into it consolidating these appeared to be the more pragmatic solution. That said, generally speaking we'd tend to prefer/merge more, smaller PR's where possible - this is only possible due to the large test coverage we've got for AKS.


Even after a short review session, I see many issues with the way SpotPriority was re-implemented. It's just one chunk, there are 10+ more which I'm not familiar with.

To answer your specific questions around Spot Max Price - unfortunately different Azure Teams can refer to the same functionality using different terminology. As such, whilst we could seek to use a consistent name here - in this instance we're following the name used by the AKS Service, rather than the Compute Service (since this is exposed as "Spot Max Price" in the AKS API rather than "Spot Bid Price" in the VMSS API).

If you've noticed specific issues/bugs in the PR then please feel free to comment on the PR Review and we can take a look/work through those however :)

…_profile` block within the `network_profile` block
@aristosvo
Copy link
Collaborator

aristosvo commented Jun 8, 2020

@tombuildsstuff Thanks for refactoring 'my' code regarding the delta-updates for the load_balancer_profile. I'm still a bit afraid #6534 might resurface, as it's not clear from the code and the tests how this works out.

I probably won't have time available to write the test for it, but it comes down to the following scenario:

  • if idle_timeout_in_minutes is changed
  • outbound_ip_address_ids was configured, but not changed
    Is the outbound_ip_adress picked up or will it trigger an error?

@tombuildsstuff
Copy link
Member Author

@aristosvo thanks for looking - for context there's a breaking change in the new API version where the field idle_timeout_in_minutes now always needs to be specified (which caused the tests TestAccAzureRMKubernetesCluster_addAgent and TestAccAzureRMKubernetesCluster_removeAgent to fail) - which is why this is being updated.

From what I can tell, the scenario for #6534 is captured in the Test TestAccAzureRMKubernetesCluster_changingLoadBalancerProfile (which is currently failing / I'm working through fixing at the moment) - but we can add an additional one if that's not sufficient, we'll see shortly :)

Copy link
Member

@jackofallops jackofallops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks Tom!

…count`, `outbound_ip_address_ids` and `outbound_ip_prefix_ids` fields"
…e_policy`

In a change from last week - the API now defaults to v2. Since v1 is deprecated, this
commit removes support for it.
@tombuildsstuff
Copy link
Member Author

Running the test suite for this, the tests look good:

Screenshot 2020-06-10 at 15 27 24

Of the failures:

So this should be good 👍

@tombuildsstuff
Copy link
Member Author

Data Source test passes after updating the CheckDestroy helper:

$ TF_PROVIDER_SPLIT_COMBINED_TESTS=1 TF_ACC=1 go test -v ./azurerm/internal/services/containers/tests/ -timeout=60m -run=TestAccAzureRMKubernetesClusterNodePoolDataSource_basic
=== RUN   TestAccAzureRMKubernetesClusterNodePoolDataSource_basic
=== PAUSE TestAccAzureRMKubernetesClusterNodePoolDataSource_basic
=== CONT  TestAccAzureRMKubernetesClusterNodePoolDataSource_basic
--- PASS: TestAccAzureRMKubernetesClusterNodePoolDataSource_basic (1428.48s)
PASS
ok  	github.com/terraform-providers/terraform-provider-azurerm/azurerm/internal/services/containers/tests	1428.528s

@ghost
Copy link

ghost commented Jun 11, 2020

This has been released in version 2.14.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.14.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Jul 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@hashicorp hashicorp locked and limited conversation to collaborators Jul 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.