Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NatGateway integration – step 2 #54

Merged
merged 1 commit into from Jan 27, 2021
Merged

NatGateway integration – step 2 #54

merged 1 commit into from Jan 27, 2021

Conversation

dkistner
Copy link
Member

@dkistner dkistner commented Mar 18, 2020

Allow users to attach their own zonal public ip addresses to the NatGateway of a Shoot cluster.
Allow also to decide in which zone the NatGateway should be deployed to. User provided public ips needs to be zonal and in the same zone as the NatGateway.

Attaching Public IP Prefixes to the NatGateway will currently not be supported as there is no association resource to attach IP Prefixes to the NatGateway in the azurerm Terraform provider. Attaching IP Prefixes is currently only possible inline on the NatGateway resource, but as we were forced to write complicated migration logic for public ips when we moved to azurerm v2 Terraform provider, we don't want to repeat that for IP Prefixes.

❌ Do not merge until hashicorp/terraform-provider-azurerm#6052 is fixed and a new Terraformer is integrated that contain a version of the azurerm provider which incorporate the fix for the mentioned issue.

What this PR does / why we need it:
Part 2 of #43

Which issue(s) this PR fixes:
Fixes partly #43

Release note:

It is now possible to attach user provided public ips to the NatGateway of an Azure Shoot and to specify to which zone the NatGateway of the Azure Shoot should be deployed to. Only zonal public ips can be attached to a NatGateway and they need to be in the same zone like the NatGateway. You can find more information [here](https://github.com/gardener/gardener-extension-provider-azure/blob/master/docs/usage-as-end-user.md#infrastructureconfig).
Changing the zone NatGateway via `.spec.provider.infrastructureConfig.networks.natGateway.zone` while using a managed NatGateway public ip will lead to a recreation of the public ip and you will most likely end up with a different public ip. That mean egress connections will use then a different public ip as before. You can find more information [here](https://github.com/gardener/gardener-extension-provider-azure/blob/master/docs/usage-as-end-user.md#infrastructureconfig).

@dkistner dkistner added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Mar 18, 2020
@dkistner dkistner requested review from a team as code owners March 18, 2020 10:51
@dkistner dkistner self-assigned this Mar 18, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 18, 2020
@dkistner dkistner mentioned this pull request Mar 18, 2020
3 tasks
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few minor suggestions. :)

pkg/apis/azure/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/v1alpha1/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/v1alpha1/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/validation/infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/validation/infrastructure.go Outdated Show resolved Hide resolved
pkg/internal/infrastructure/terraform.go Outdated Show resolved Hide resolved
pkg/internal/infrastructure/terraform.go Outdated Show resolved Hide resolved
pkg/internal/infrastructure/terraform.go Outdated Show resolved Hide resolved
pkg/internal/infrastructure/terraform.go Outdated Show resolved Hide resolved
pkg/internal/infrastructure/terraform.go Outdated Show resolved Hide resolved
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 2, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 2, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Apr 2, 2020
@rfranzke rfranzke marked this pull request as draft April 9, 2020 08:04
@rfranzke
Copy link
Member

rfranzke commented Apr 9, 2020

Converted to a draft PR as its dependencies are not yet resolved. Please mark as 'ready for review' once they are resolved.

@dkistner dkistner changed the title [Do-Not-Merge] NatGateway integration – step 2 NatGateway integration – step 2 Apr 27, 2020
@rfranzke
Copy link
Member

@dkistner
Copy link
Member Author

Nice! Let's wait until there is a new release of the azurerm terraform provider. I will try it out then and update the pr accordingly.

@dkistner
Copy link
Member Author

Now there is an release which contain the fix https://github.com/terraform-providers/terraform-provider-azurerm/blob/v2.12.0/CHANGELOG.md#2120-may-28-2020
I will try out.

@dkistner
Copy link
Member Author

So have tried it out. It does not work without introducing a new association resource for the public ip to nat gateway instead of using the internal public ip association of the nat gateway resource.

The migration from the internal association to association resource does not work, see here: hashicorp/terraform-provider-azurerm#7167
Seems it can be solved by importing the public ip resource to the Terraform state.

@ialidzhikov
Copy link
Member

ialidzhikov commented Jul 27, 2020

The migration from the internal association to association resource does not work, see here: terraform-providers/terraform-provider-azurerm#7167
Seems it can be solved by importing the public ip resource to the Terraform state.

I checked hashicorp/terraform-provider-azurerm#7167 and it is quite unfortunate.
The thing I am wondering is whether we can have

resource "azurerm_nat_gateway" "nat" {
  name                    = "{{ required "clusterName is required" .Values.clusterName }}-nat-gateway"
  location                = "{{ required "azure.region is required" .Values.azure.region }}"
  {{ if .Values.create.resourceGroup -}}
  resource_group_name     = azurerm_resource_group.rg.name
  {{- else -}}
  resource_group_name     = data.azurerm_resource_group.rg.name
  {{- end }}
  sku_name                = "Standard"
  {{ if .Values.natGateway.migrateToPublicIPAssociation -}}
  public_ip_address_ids   = []
  {{- end }}
}

resource "azurerm_nat_gateway_public_ip_association" "nat-ip-association" {
   nat_gateway_id       = "${azurerm_nat_gateway.nat.id}"
   public_ip_address_id = "${azurerm_public_ip.natip.id}"
}

I guess we could have in the Infrastructure status field like migratedToPublicIPAssociation that indicates whether the infra migrated to azurerm_nat_gateway_public_ip_association.
In the first apply run, the terraformer would run with migrateToPublicIPAssociation: true which will remove for a short the public IP addresses from the nat gateway (because of public_ip_address_ids = []) but then it will add them once again because of the azurerm_nat_gateway_public_ip_association resource. After successful run, we could set the status.migratedToPublicIPAssociation=true and then pass in the subsequent runs migrateToPublicIPAssociation=false to the terraformer chart.
I don't know exactly what is the complexity of the terraform import way but this approach sounds easier to implement? WDYT?

@dkistner
Copy link
Member Author

@ialidzhikov I had the same idea and already implemented that almost exactly like you proposed. The issue here is that there will be a time during the first and the second reconciliation where no public ip is assigned to the NatGateway. In this period of time there is no ip assigned to the NatGateway and it can't be used to route egress traffic which require a stable ip. For some workloads this would mean a serious disruption.

Can we somehow influence that an infrastructure resource get reconciled multiple times even after the previous reconcile succeed? I didn't find a way to archive that. Therefore I rather thought in the direction of import support.

I think having a generic way to support Terraform imports for the Gardener extensions could also help in the future with other migration scenarios.

@ialidzhikov
Copy link
Member

ialidzhikov commented Jul 27, 2020

@ialidzhikov I had the same idea and already implemented that almost exactly like you proposed. The issue here is that there will be a time during the first and the second reconciliation where no public ip is assigned to the NatGateway. In this period of time there is no ip assigned to the NatGateway and it can't be used to route egress traffic which require a stable ip. For some workloads this would mean a serious disruption.

Hmm, why there would be 2 reconciliations. Setting public_ip_address_ids to [] and introduction of azurerm_nat_gateway_public_ip_association happen in the same infra reconciliation - one after other in the terraform execution. So the public ip should be removed and then associated again after few seconds.

  1. This is what we have now:
resource "azurerm_public_ip" "natip" {
  name                = "nat-ip"
  location            = "westeurope"
  resource_group_name = "${azurerm_resource_group.rg.name}"
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_nat_gateway" "nat" {
  name                    = "nat-gateway"
  location                = "westeurope"
  resource_group_name     = "${azurerm_resource_group.rg.name}"
  sku_name                = "Standard"
  public_ip_address_ids   = ["${azurerm_public_ip.natip.id}"]
}
  1. This is how migration run looks like
resource "azurerm_public_ip" "natip" {
  name                = "nat-ip"
  location            = "westeurope"
  resource_group_name = "${azurerm_resource_group.rg.name}"
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_nat_gateway" "nat" {
  name                    = "nat-gateway"
  location                = "westeurope"
  resource_group_name     = "${azurerm_resource_group.rg.name}"
  sku_name                = "Standard"
- public_ip_address_ids   = ["${azurerm_public_ip.natip.id}"]
+ public_ip_address_ids   = []
}

+resource "azurerm_nat_gateway_public_ip_association" "nat-ip-association" {
+  nat_gateway_id       = "${azurerm_nat_gateway.nat.id}"
+  public_ip_address_id = "${azurerm_public_ip.natip.id}"
+}
  1. This is how a normal run after the migration looks like:
resource "azurerm_nat_gateway" "nat" {
  name                    = "nat-gateway"
  location                = "westeurope"
  resource_group_name     = "${azurerm_resource_group.rg.name}"
  sku_name                = "Standard"
- public_ip_address_ids   = []
}

resource "azurerm_nat_gateway_public_ip_association" "nat-ip-association" {
   nat_gateway_id       = "${azurerm_nat_gateway.nat.id}"
   public_ip_address_id = "${azurerm_public_ip.natip.id}"
}

@CLAassistant
Copy link

CLAassistant commented Sep 16, 2020

CLA assistant check
All committers have signed the CLA.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 20, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 21, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 21, 2021
pkg/apis/azure/v1alpha1/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/validation/infrastructure.go Outdated Show resolved Hide resolved
docs/usage-as-end-user.md Outdated Show resolved Hide resolved
docs/usage-as-end-user.md Outdated Show resolved Hide resolved
pkg/apis/azure/v1alpha1/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/v1alpha1/types_infrastructure.go Outdated Show resolved Hide resolved
pkg/apis/azure/validation/infrastructure.go Outdated Show resolved Hide resolved
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 26, 2021
@dkistner
Copy link
Member Author

Thanks @ialidzhikov
I have addressed the requested changes.

@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 26, 2021
@ialidzhikov
Copy link
Member

/test

@testmachinery
Copy link

testmachinery bot commented Jan 27, 2021

Testrun: e2e-wsmbr
Workflow: e2e-wsmbr-wf
Phase: Succeeded

+---------------------+---------------------+-----------+----------+
|        NAME         |        STEP         |   PHASE   | DURATION |
+---------------------+---------------------+-----------+----------+
| infrastructure-test | infrastructure-test | Succeeded | 25m19s   |
+---------------------+---------------------+-----------+----------+

ialidzhikov
ialidzhikov previously approved these changes Jan 27, 2021
Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/needs second-opinion

@gardener-robot gardener-robot removed needs/review Needs review needs/second-opinion Needs second review by someone else labels Jan 27, 2021
@ialidzhikov ialidzhikov added the needs/second-opinion Needs second review by someone else label Jan 27, 2021
@gardener-robot gardener-robot added the needs/review Needs review label Jan 27, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 27, 2021
@dkistner
Copy link
Member Author

@kon-angelo Thanks for the review. I have addressed the requested changes.

@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 27, 2021
Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review needs/second-opinion Needs second review by someone else labels Jan 27, 2021
@dkistner dkistner merged commit 86a20eb into master Jan 27, 2021
@dkistner dkistner deleted the az-nat-step2 branch January 27, 2021 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change API change with impact on API users kind/test Test needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants