Defer to lazy client initialization to prevent dialing localhost #1078

pfault · 2020-11-29T16:05:41Z

Description

Defers the initialization of the actual client to first use to prevent dialling http://localhost instead of late configured credentials.
Fixes cases where using parameters like config_path that are generated during the state apply lead to use of an empty default config.

Acceptance tests

Have you added an acceptance test for the functionality being added?
Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestProvider'

==> Checking that code complies with gofmt requirements...
rm -rf /home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraform /home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraform.lock.hcl || true
mkdir /home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraform
mkdir -p /tmp/.terraform.d/localhost/test/kubernetes/9.9.9/linux_amd64 || true
go clean -cache
go build -o /tmp/.terraform.d/localhost/test/kubernetes/9.9.9/linux_amd64/terraform-provider-kubernetes_9.9.9_linux_amd64
cd /home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers && TF_CLI_CONFIG_FILE=/home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraformrc TF_PLUGIN_CACHE_DIR=/home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraform terraform init -upgrade

Initializing the backend...

Initializing provider plugins...
- Finding localhost/test/kubernetes versions matching "9.9.9"...
- Installing localhost/test/kubernetes v9.9.9...
- Installed localhost/test/kubernetes v9.9.9 (unauthenticated)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
TF_CLI_CONFIG_FILE=/home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraformrc TF_PLUGIN_CACHE_DIR=/home/pfault/git/terraform-provider-kubernetes/kubernetes/test-infra/external-providers/.terraform TF_ACC=1 go test "/home/pfault/git/terraform-provider-kubernetes/kubernetes" -v -run=TestProvider -timeout 120m
=== RUN   TestProvider
--- PASS: TestProvider (0.03s)
=== RUN   TestProvider_impl
--- PASS: TestProvider_impl (0.00s)
=== RUN   TestProvider_configure
2020/11/29 16:10:27 [DEBUG] Trying to load configuration from file
2020/11/29 16:10:27 [DEBUG] Configuration file is: test-fixtures/kube-config.yaml
2020/11/29 16:10:27 [DEBUG] Using custom current context: "gcp"
2020/11/29 16:10:27 [DEBUG] Using overidden context: api.Context{LocationOfOrigin:"", Cluster:"", AuthInfo:"", Namespace:"", Extensions:map[string]runtime.Object(nil)}
2020/11/29 16:10:27 [INFO] Successfully initialized config
--- PASS: TestProvider_configure (0.00s)
PASS
ok  	github.com/hashicorp/terraform-provider-kubernetes/kubernetes	0.067s

Release Note

Release note for CHANGELOG:

NONE

References

#758
#1028

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

hashicorp-cla · 2020-11-29T16:05:44Z

All committers have signed the CLA.

alexsomesan · 2020-11-30T16:29:25Z

Thanks for taking a shot a this issue! The solution looks good and seems to follow the same pattern we applied to the Kubernetes alpha provider.

I'll be testing this a bit to see if I can come up with an automated test scenario.

alexsomesan · 2020-12-04T14:37:57Z

This looks good.

We're going to hold off merging for a bit. Some wider ranging credentials changes are in flight here: #1052. Once those merge, we can go ahead and merge this too. I think it's easier in that order since this is smaller and easier to rebase if it runs into conflicts.

mroche89 · 2021-01-24T06:13:05Z

Any movement on this? #1052 was merged a month ago

pduchnovsky · 2021-03-03T08:28:56Z

Is this going to be merged soon ? We're pending on this fix in order to actually utilize terraform for kubernetes deployments with GPU on GKE.. (older provider version doesn't work with taint "nvidia.com/gpu")

@alexsomesan

petewilcock · 2021-06-14T12:53:16Z

@alexsomesan +1 request to see this merged after @pfault can resolve merge conflicts.

This bug appears surface in any workflow that would lead to cluster recreation, so is of particularly great pain to anyone iterating development in this matter (per: #1028 (comment))

Karthik-13 · 2021-06-25T13:53:34Z

Hey guys, we are facing this problem as well.
any ETA on when this will be merged to the provider?

pduchnovsky · 2021-08-11T10:24:39Z

Please, for the love of terraform, merge this already.

jw-maynard · 2021-08-11T20:38:32Z

@pfault Do you have time to fix the merge conflicts?

ermik · 2021-08-27T01:09:12Z

Happy to resolve conflicts for this, just let me know how to collaborate. As a simple route, I can create a new PR by forking the PR branch. cc @flavioab

pfault · 2021-08-31T11:26:34Z

Re-based to main, fixing merge conflict

jw-maynard · 2021-08-31T21:18:41Z

@alexsomesan @flavioab can we get this merged?

RichiCoder1 · 2021-09-13T17:39:06Z

Not to bump, but bump. Currently stuck on v1 due to the way this blocks migration and iteration.

DutchEllie · 2021-09-28T20:54:54Z

What the other person above me said. I don't wanna bump, but uh... Bump.

This thing needs to get merged.

ermik · 2021-10-05T22:10:39Z

Pinging @alexsomesan and @jrhouston 👋🏻

dambros · 2021-11-04T18:18:30Z

Please merge this. It has been driving me nuts on a daily basis

raven-wing · 2021-11-10T10:22:22Z

I'm pretty sure there is a reason why it is still not merged, but...
Could anyone provide such information?

Errox · 2021-11-25T14:00:53Z

When can we expect the merge?

alexsomesan · 2021-11-25T15:34:19Z

I think the effect of this change is focused around the removal of this part:

	if cfg == nil {
		// This is a TEMPORARY measure to work around https://github.com/hashicorp/terraform/issues/24055
		// IMPORTANT: this will NOT enable a workaround of issue: https://github.com/hashicorp/terraform/issues/4149
		// IMPORTANT: if the supplied configuration is incomplete or invalid
		///IMPORTANT: provider operations will fail or attempt to connect to localhost endpoints
		cfg = &restclient.Config{}
	}

The rest of the code shuffle looks like it wouldn't make a difference in behaviour.

However, as noted by those comment notes, that part was added there a workaround. It's likely that removing it will cause side-effects which are hard to predict now.

To move this conversation forward, we would need some solid reproduction configurations. If anyone is willing to contribute some that would demonstrate how this change makes an impact, we can move from there.

raven-wing · 2021-11-26T09:10:46Z

I'm afraid that it would not be easy for solid reproduction configuration since this error happens "sometimes"...

Nonetheless our configuration is copied from hashicorp example:

provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.default.kube_config.0.host
client_certificate = base64decode(data.azurerm_kubernetes_cluster.default.kube_config.0.client_certificate)
client_key = base64decode(data.azurerm_kubernetes_cluster.default.kube_config.0.client_key)
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.default.kube_config.0.cluster_ca_certificate)
}

The point is that cluster is applied in other root dir. We use AKS.

What "helps" to get this error is to run apply azurerm_kubernetes_cluster and cluster setup from different machines.

dskecse · 2021-12-24T13:15:27Z

hey @alexsomesan @flavioab, any ETA here? could this be merged in any time soon, please?

ebauman · 2022-01-18T22:07:29Z

Could we please get this merged sometime soon?

yellowmegaman · 2022-02-01T13:24:51Z

@apparentlymart @radeksimko sorry for bothering, but you guys are awesome, and this renders kubernetes provider unusable for multiple-cluster setup when you're changing some cluster property (e.g. disk_size for node).

Maybe you can somehow influence this ? 🥲

apparentlymart · 2022-02-01T16:23:32Z

I don't have sufficient knowledge about Kubernetes or this provider to weigh in on when and how this could be merged, I'm afraid.

Seeing this does make me think a bit about the solution though: Terraform Core should be sending an unknown value to the provider in the situation where the configuration is derived from unknown values from elsewhere, which can in principle allow a provider to differentiate between totally unset vs. not yet known.

However, I believe the old SDK hides that detail when you access the configuration through its API, making it seem like the unknown values are not set at all. If backward compatibility is a concern for this change (I'm not sure) then it might be worth considering how the SDK could potentially expose information about whether particular arguments are known and then have this special new behavior occur in the unknown case, leaving the unset case behaving however it currently does. (Sounds like it currently defaults to localhost in that case.)

ScratZa · 2022-02-02T18:58:14Z

bump

yellowmegaman · 2022-02-02T20:26:30Z

I don't have sufficient knowledge about Kubernetes or this provider to weigh in on when and how this could be merged, I'm afraid.

Seeing this does make me think a bit about the solution though: Terraform Core should be sending an unknown value to the provider in the situation where the configuration is derived from unknown values from elsewhere, which can in principle allow a provider to differentiate between totally unset vs. not yet known.

However, I believe the old SDK hides that detail when you access the configuration through its API, making it seem like the unknown values are not set at all. If backward compatibility is a concern for this change (I'm not sure) then it might be worth considering how the SDK could potentially expose information about whether particular arguments are known and then have this special new behavior occur in the unknown case, leaving the unset case behaving however it currently does. (Sounds like it currently defaults to localhost in that case.)

Thank you for your input! Too bad that it seems that the root cause is a lot deeper.

For anyone who stumbles upon this issue, please note that this code alone can break kubernetes provider right now at version 2.7.1

provider "google" {
  project = "yourproject"
  region  = "us-central1-a"
}

data "google_client_config" "default" {}

resource "google_container_cluster" "hello" {
  name               = "hello"
  location           = "us-central1-a"
  initial_node_count = 1
}

provider "kubernetes" {
  host                   = "https://${google_container_cluster.hello.endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(google_container_cluster.hello.master_auth[0].cluster_ca_certificate)
}

resource "kubernetes_namespace" "traefik" {
  metadata {
    name = "traefik"
  }
}

Just change the initial_node_count parameter and it fail with message about localhost.
(same thing with data resource in the middle).

At this point I don't think kubernetes operator can be used without some extraordinary operational detours (e.g. separate management of cluster and k8s, separate states and remote state data source)

orgads · 2022-02-02T20:29:37Z

In my case, if I run (adapted to your example) terraform apply -target google_container_cluster.hello, and then terraform apply, it works as expected.

Can you check if it works for you?

yellowmegaman · 2022-02-02T21:49:58Z

-target google_container_cluster.hello

Yes, it does work.
Was also considering this option, but in this case I'll need to have external tf state checker mechanism to have additional pipeline step for container cluster changes.

orgads · 2022-02-02T21:56:19Z

Well, it's clearly a workaround, not a solution.

apparentlymart · 2022-02-02T22:20:39Z

FWIW, in other cases where folks have employed that workaround it's typically only been needed for initial creation and not for subsequent updates, because the argument in question stays stable after the object has been created. Again, I don't know this particular situation well enough to say for certain, but I'd guess that the hostname of a cluster would stay constant for the life of that cluster and so the extra run with -target would only be needed on the first iteration to get that cluster created, and then after that you can use a normal workflow for ongoing maintenence.

yellowmegaman · 2022-02-03T08:29:17Z

FWIW, in other cases where folks have employed that workaround it's typically only been needed for initial creation and not for subsequent updates, because the argument in question stays stable after the object has been created. Again, I don't know this particular situation well enough to say for certain, but I'd guess that the hostname of a cluster would stay constant for the life of that cluster and so the extra run with -target would only be needed on the first iteration to get that cluster created, and then after that you can use a normal workflow for ongoing maintenence.

Well actually in the example above first run isn't a problem at all, but on the contrary the subsequent runs are the problem if you change any cluster parameter that will trigger it's recreation.

depends_on parameter for provider hashicorp/terraform#2430 could also help here

If we create outputs for example configuration above, outputting cluster endpoint and ca_cert, this is what I get after initial creation and after update with -target specified.
Outputs:

output "hello_endpoint" {
  value = google_container_cluster.hello.endpoint
}
output "hello_ca" {
  value = google_container_cluster.hello.master_auth[0].cluster_ca_certificate
}

First run:

hello_ca = "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVMVENDQXBXZ0F3SUJBZ0lSQUpIWTFUOVZVbVA1TE5VN3ZBckxtbDB3RFFZSktvWklodmNOQVFFTEJRQXcKTHpFdE1Dc0dBMVVFQXhNa09ESmtPVEUwWVdRdE56RmxOeTAwWW1FeUxUaGpabUV0TW1RMVl6Z3lOelExTURBMQpNQ0FYRFRJeU1ESXdNekEzTVRFd01sb1lEekl3TlRJd01USTNNRGd4TVRBeVdqQXZNUzB3S3dZRFZRUURFeVE0Ck1tUTVNVFJoWkMwM01XVTNMVFJpWVRJdE9HTm1ZUzB5WkRWak9ESTNORFV3TURVd2dnR2lNQTBHQ1NxR1NJYjMKRFFFQkFRVUFBNElCandBd2dnR0tBb0lCZ1FEYzBlZW1QMms2YVM5Mi9kLzh2dkhTLzZlVUR4T0djRjRoMUJiVApnb09FSjZhWjlLVFB4Zkdldms5eFBFcEVxMGhIMlVtYVVSKzk2ZG9UTWgzMzZGSTBmeUYrWk5ka2lGcTNjazJwCkhlTEtsUGlvcmFPZWF4MEYya204d1ZmQTJuTE5JZzk0SGF2d3RZRHQ5L0hEVnFVNjNKWmpPVzZPYVBQNEppU3gKTGNUbmo4TzZVZnExTnJTTU80bU5mV1M0aCs5RE1ZSXNDZlM0TXczS0lKK3E4R2hGVGpjVHZiTEdyU01NKzRiSQoybm5oMGxyMFk2Vzh4YS9KeU55WEljNWtoVEYwb3pqUzByUUZ3WFhqeXYxRmQxZWRaOUlRR243N0F2MHJqWVQ2CmZzTWZBSTlFOE5NanRJVjByN1NpYVI3eVhtaE9GUEhTcmNOQ0EzdUJCMG5LdjRpN0hPUXo4WElWQUlOK0oyVUYKYys2dW5yZmxtQlhHc1dWZko3dUNFWW41VmNzN2t3Z21JRnRMa0dZS2syclBzN2x2MFhNZ3d6MXFrdzVLUEIweApHTlprTll6dzA4dS9oYkw4d0loellmclpFZ2RHQS9zM2J1N0ZoWlh3OHpBc3JCSGd0R0UxTlZHaWZ0cEQzQ3JPClBxTzBkUXVQcFJIUlkveUNuMTl5eXQxMElaVUNBd0VBQWFOQ01FQXdEZ1lEVlIwUEFRSC9CQVFEQWdJRU1BOEcKQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGTldJL21rQ3djbS9NaVIzVkZuQmxtZU5aM1pDTUEwRwpDU3FHU0liM0RRRUJDd1VBQTRJQmdRQ1Fqc2pnNUdtN1hsOXg4RXR6bmpWRlFYaTRCY1FaQnQyTXowdzV4bWhlCktLaDZaNjRJQ2UxSEt5bFlLbXVrbDJhMVRZNEdqd2xsanNLT2lqUXpnWFZVTng5bjF2UFB1WktFQ1p3OVY1L3kKRTBQWHEzaE5paTd1QVJXbkR3TUlMaFl1TTVZaGtRa0VRUlNJdFdrc0p0Mko4ajNsNHRoQnRzUzFENVlySjczVwpmdHMxU241czlSWngvQ1JDeUhFaWZMUTZKcDZ1czBtTmY5SHFUWXkrZngyY1E5Z0RXMEFod2FZbStCVFZzckdzCkY5YWNaZEFOWllxTDdPL3QvWFJmR1hQWERUSFFsV2NMdVNCc2FOdjdQeEtKckExcjUzVEZpNDViZVZud0pIa2MKQXhXNHlseCtUNzk1TFBKell4NUY5T3EyTjBialZ2R01WWHViSE8wSXFkZkZSU1kxMStvWDQyNFVzN2ZzU3hRNQpUR2ZFSXE2eDJycjVyTmNac2VkeHMwOC8zbHcrRnZnS3ZUU3MzbXVTbDVWZlN0ZW4waHl1M0hkWU5TVm94S1ltCk54NWpEMU85YlJ0MktHN1ZzNXFjYm1aL2YyWEhtTk5UVC9sTmlBUzY0R1U0MFRIaWJmQ3NQQ2RDM0M2OVRXNEUKUUVZa3g1SFptS2lGRy9CYS8vaFVxaFE9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
hello_endpoint = "34.67.118.138"

Changed initial_node_count to 2 and specified -target=google_container_cluster.hello:

hello_ca = "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVMVENDQXBXZ0F3SUJBZ0lSQUo0cmEwZTEwa2s3dEF3SEpIN01VMzB3RFFZSktvWklodmNOQVFFTEJRQXcKTHpFdE1Dc0dBMVVFQXhNa01XUmpZbUk1WmpndFptTTJOUzAwTWpNMkxUZ3hOR1F0TURVNE9XUXhPR1ZoTmpaaQpNQ0FYRFRJeU1ESXdNekEzTVRnd01sb1lEekl3TlRJd01USTNNRGd4T0RBeVdqQXZNUzB3S3dZRFZRUURFeVF4ClpHTmlZamxtT0MxbVl6WTFMVFF5TXpZdE9ERTBaQzB3TlRnNVpERTRaV0UyTm1Jd2dnR2lNQTBHQ1NxR1NJYjMKRFFFQkFRVUFBNElCandBd2dnR0tBb0lCZ1FETFREUmhIdW1VUkhIU3hQUlFjcVZBek41NElSeGMzNUpEV1VOMwpna1ExS1cweXBZZVhnc1R6bmxsVjBvQnNFcXlkTGRnVjJ2RWZha0FCQTU4TWMvZ2VpSWpMcVIrVHppM2ZMWFFjCmZBanV4dTMrcGdYUHNNR2VIdlNsWmppbGdTa3VFMEN0VC9Ca3lnUFhwTjlJL2l4OEx0UVZ4Q0sxOHgwc1Vxc2EKMzQ3Rmc0dUxla0VJV09PUkthOWVlZmpkZXhrV3Fhdi9zZjJ6d2pMb0VnSUZXUEd0QkhJRmdTOHhGMVlhcGdJcQpsWElLWkNzZDc3RWhkaXNvREVBUEY5VmRPRVY2d3pjZHFxMmNtOUhYc29oWlJuYzYvVW1PTEJuSVRDRGlJQmZVCnNwYkNKK2Z5YkhNU3pLbmJualppLytnSjEreXIvQnNGaTdyRFJzb2UwTThQOUJmeXNRVHJTMjJVWUQ1WUlWMUgKbEFuT1lxanoyOVp6VHNvdzlxWG5NclNOa2J4MzA2RDlPdStTOFRJRzdQVC9PQWFWekJLdnJuVnJFYkJ5OEgvagprK0JUd3BBbVh6d1ZoRUg3amlSRU9vUURrcmFOQmxNbDJrYmJjVGhXY2g5elJlNnJKVFh4N2wzRVhvZUZ5Wk0wCjh1R3d2VEdHVkZ2VUFFTDdmbGVXMzBRTk5tOENBd0VBQWFOQ01FQXdEZ1lEVlIwUEFRSC9CQVFEQWdJRU1BOEcKQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGQ3pIYW8wSjZEbHh4OHFuQWxoRThqUDVRKzY0TUEwRwpDU3FHU0liM0RRRUJDd1VBQTRJQmdRQkZMY1J4UThJeTI5dml6TEZoaUp5SzhseVovUU5sR0d0U2VwbStSczVZClI3dUc2RDdYbk1yUXpxRmlwbldQY3JpUGRjeWdSV3A1YUlPdWFlL1k0QmlobnJZc1U2OUNkWXo4dCs1a2owVGIKSHpJb0F2WWQrc01RSjVRSFZOLythS042dGZ4U0dkNzFzcVZOY2JONms2cWZpbGt4UHZuMXlJbnlmb0FHczh5dApudnJzdVdFUXBQUkFQVDJ3LzkvZVo4by9BbDRLL2JDT0ppUGdUWXJnbFh1NGZVWTJabmxGSm9Dd2JKak95dHBLCmh2WmVuOXdNU3U2QS9va1hKZ1JCVWhpMFEvWThNU3B2M05uVGtLTjVOWnRkdjgzem1vSXNub1pKTHdxcG95OHkKcXlKYTBmYk9YbzdEa3dja2Y4Y3BpWXE3c3hVY2JvNE0wMmxSVHExM3VvTFkxL0x5bFoxSSs1VlBkN2xaKzdpbQpoQzUyeDVHMzl3NEU0RG1BRTV5dEE0U0UvTGFzSWoxZXh5NDRkQmtEeXdjSWpySVdURUZNQlV4RkZEK1h1WUJuCklscFpkMkVUWkJnNFRIME5JbVQxWmM1MUd2azJDYWRRSTRwd243WmVrOEhONW5lRHlnYm0xd3hlUTczV0l0KzAKbW5NMzBQZUJYTGFsNGtqM1ZzYmdwdzA9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
hello_endpoint = "35.188.61.80"

So kubernetes cluster was re-created.
Provider was not having trouble waiting for new cluster to appear and use endpoint and ca from it to authenticate, but at re-creation time it's not waiting, but defaults to localhost instead. @pfault do you think the PR in question can help with exactly this?

I was also hoping to circumvent this by using exec authentication for kubernetes provider, but I guess this won't help since even if it worked (#1176), it would only get old endpoint and ca at cluster recreation time.

jw-maynard · 2022-02-04T00:12:05Z

@apparentlymart As @yellowmegaman points out creation is fine, it's anything that reconfigures with the cluster that seems to cause the problem. I use the word reconfigures instead of recreates because in my case the cluster is not actually recreated.

We use EKS for our k8s clusters and if I try to update the cluster version this problem appears. I'm not sure what black magic AWS pulls behind the curtains when you upgrade and EKS cluster but it seems to trigger this dialing of localhost instead of the cluster credentials.

As a workaround I go into the EKS console in AWS and upgrade my cluster from there then I go to terraform and change the EKS version number in our terraform code after the fact. If I do this I circumvent the problem in my use case. It's obviously not a scalable or convenient solution but we have few enough EKS cluster that we can manage.

yellowmegaman · 2022-02-04T23:34:47Z

Just noticed that terraform plan -target=xxx approach also has some enormous drawbacks. If you filter out all kubernetes provider related resources, terraform plan won't pick up new ones.

Even if we split CI/CD pipeline to all vs kubernetes-based lists of targets, it won't be enough to create new infrastructure. It won't be IaC anymore, because of number of manual actions required.

Almost same goes for terraform_remote_state data source, you'll need separate repo's to push (in correct order) infra / k8s workloads configurations to avoid conflicts. This way adding a new cluster with terraform-managed k8s workload also becomes a great hassle.

At this point null_resources doesn't seem like last resort anymore.

Running out of ideas on how to use this provider with any resources subject to change.

@alexsomesan @radeksimko @dak1n1 @jrhouston do you have bandwidth to review/merge this PR please?

DanielHabenicht · 2022-02-04T23:43:25Z

Running out of ideas on how to use this provider with any resources subject to change.

We have swallowed the bitter pill of splitting the configuration into two deployments and then using data sources to get the modified clusters, effectively decoupling them. (One terraform run for setting/updating the clusters and one for configuring the clusters with this provider)

ajostergaard · 2022-02-26T08:27:58Z

@DanielHabenicht we use the strategy described here and at the moment it works well and we have no reason to split as you have done.

However, we are worried that we are missing something that will trap us later. I assume you tried the pattern above so, if you don't mind, what was it exactly that you couldn't achieve with that pattern that caused you to split into two deployments? Thanks in advance!

DanielHabenicht · 2022-02-26T12:54:37Z

@DanielHabenicht we use the strategy described here and at the moment it works well and we have no reason to split as you have done.

However, we are worried that we are missing something that will trap us later. I assume you tried the pattern above so, if you don't mind, what was it exactly that you couldn't achieve with that pattern that caused you to split into two deployments? Thanks in advance!

It was just a hassle to manually set all targets (as Kubernetes is not the only thing we deploy), so we opted for clear split.

FalconerTC · 2022-03-22T03:07:49Z

Any update here?

mrwonko · 2022-03-25T15:06:34Z

kubernetes/provider.go

@@ -226,7 +226,7 @@ type KubeClientsets interface {
 }

 type kubeClientsets struct {
-	config              *restclient.Config
+	config              *clientcmd.ClientConfig


This should not be a pointer because ClientConfig is an interface.

mrwonko · 2022-03-25T17:12:48Z

I've been debugging this and noticed that for me, the provider logs

[WARN] Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: more than one authentication method found for ; found [token basicAuth], only one is allowed

You can only see this with TF_LOG=DEBUG or above.

Apparently the provider silently recovers from this (basically ignoring the error) by using a default configuration (which connects to localhost).

Further investigation revealed that I had set the KUBE_USER environment variable in addition to having an auth token configured. Unsetting KUBE_USER resolved the issue for me.

But I think this PR intended to address a different root cause.

DorukAkinci · 2022-10-18T10:13:21Z

This is a very important bug that needs to be addressed. Is there any update on this issue?

danielllek · 2022-11-24T14:49:23Z

Just hit the bug and lost plenty of time to find the cause :(

xRDHS · 2023-01-12T13:02:15Z

Same @danielllek

streamnsight · 2023-03-14T06:34:28Z

I hit the bug with TF 1.0 but it seems like it is gone in v1.2.x
The reason for this issue is the Refresh stage:

in first plan, refresh happens but k8s resource do not exist so nothing happens.
on first apply, same thing, nothing happens on refresh, so tf moves on
on change to any config, running plan or apply will attempt refresh first, but the DATASOURCES are not updated until later, so any k8s provider that makes uses of dynamic cluster info (id, cert, exec command) will fail there. The fix at this point is to use -refresh=false

in tf v1.2.x, it seems like datasource refresh now happens before the provider configuration, and the data is now available.
Using kubernetes 2.18.1 and tf 1.2.0, I have not seen this issue anymore.

I looked at the code in depth, and this PR would not help, because the provider is initialized by calling the ConfigureContextFunc anyway, so delaying the config does not make any difference if the datasource data is not there.

RichiCoder1 · 2023-03-14T19:36:14Z

This is maybe a quirk of the module in question, but this issue is still be reported with v1.3.9 and the latest providers: terraform-aws-modules/terraform-aws-eks#2501

streamnsight · 2023-03-14T21:02:06Z

RichiCoder1

See my reply on the issue: the example failing code does not use a datasource to configure the provider, so it may indeed still fail. Using the data source lookup pattern, it should work with tf1.2

chamini2 · 2023-03-14T21:11:41Z

@streamnsight , I create the Kubernetes cluster in the same module and then access it and provision kubernetes manifests in.

Can you help me understand how I would change to use data instead of resource in that case so I can avoid this issue?

streamnsight · 2023-03-14T21:45:59Z

@chamini2 just follow the examples
https://github.com/hashicorp/terraform-provider-kubernetes/tree/main/_examples

jw-maynard · 2023-03-14T22:35:40Z

@streamnsight this solution won't work if the cluster doesn't already exist in AWS.

chamini2 · 2023-03-14T23:07:48Z

@chamini2 just follow the examples https://github.com/hashicorp/terraform-provider-kubernetes/tree/main/_examples

The only difference I see from my solution is the explicit depends_on in

terraform-provider-kubernetes/_examples/gke/main.tf

Line 36 in d49291f

depends_on = [module.gke-cluster]

I will add that and see if I don't get any more errors.

ghost added size/M labels Nov 29, 2020

pduchnovsky mentioned this pull request Mar 3, 2021

terraform refresh attempts to dial localhost (reopening with workaround) #1028

Open

flavioab approved these changes Mar 10, 2021

View reviewed changes

Base automatically changed from master to main March 23, 2021 15:53

Change to lazy client initialization to prevent dialing localhost

b0fae1f

pfault force-pushed the lazy-initialization branch from e656c11 to b0fae1f Compare August 31, 2021 11:23

mrwonko reviewed Mar 25, 2022

View reviewed changes

mirestrepo mentioned this pull request Aug 30, 2022

Handle GKE AUTH intermittent errors BrownUniversity/terraform-gcp-jupyterhub#25

Open

github-actions bot added the stale label Mar 14, 2024

iBrandyJackson removed the stale label Mar 18, 2024

Defer to lazy client initialization to prevent dialing localhost #1078

Are you sure you want to change the base?

Defer to lazy client initialization to prevent dialing localhost #1078

Conversation

pfault commented Nov 29, 2020

Description

Acceptance tests

Release Note

References

Community Note

hashicorp-cla commented Nov 29, 2020 • edited

alexsomesan commented Nov 30, 2020

alexsomesan commented Dec 4, 2020

mroche89 commented Jan 24, 2021

pduchnovsky commented Mar 3, 2021 • edited

petewilcock commented Jun 14, 2021

Karthik-13 commented Jun 25, 2021

pduchnovsky commented Aug 11, 2021

jw-maynard commented Aug 11, 2021

ermik commented Aug 27, 2021 • edited

pfault commented Aug 31, 2021

jw-maynard commented Aug 31, 2021

RichiCoder1 commented Sep 13, 2021

DutchEllie commented Sep 28, 2021

ermik commented Oct 5, 2021

dambros commented Nov 4, 2021

raven-wing commented Nov 10, 2021

Errox commented Nov 25, 2021

alexsomesan commented Nov 25, 2021

raven-wing commented Nov 26, 2021

dskecse commented Dec 24, 2021

ebauman commented Jan 18, 2022

yellowmegaman commented Feb 1, 2022 • edited

apparentlymart commented Feb 1, 2022

ScratZa commented Feb 2, 2022

yellowmegaman commented Feb 2, 2022

orgads commented Feb 2, 2022

yellowmegaman commented Feb 2, 2022

orgads commented Feb 2, 2022

apparentlymart commented Feb 2, 2022

yellowmegaman commented Feb 3, 2022 • edited

jw-maynard commented Feb 4, 2022

yellowmegaman commented Feb 4, 2022

DanielHabenicht commented Feb 4, 2022 • edited

ajostergaard commented Feb 26, 2022

DanielHabenicht commented Feb 26, 2022

FalconerTC commented Mar 22, 2022

mrwonko Mar 25, 2022

Choose a reason for hiding this comment

mrwonko commented Mar 25, 2022 • edited

DorukAkinci commented Oct 18, 2022

danielllek commented Nov 24, 2022 • edited

xRDHS commented Jan 12, 2023

streamnsight commented Mar 14, 2023

RichiCoder1 commented Mar 14, 2023

streamnsight commented Mar 14, 2023

chamini2 commented Mar 14, 2023 • edited

streamnsight commented Mar 14, 2023

jw-maynard commented Mar 14, 2023

chamini2 commented Mar 14, 2023 • edited

hashicorp-cla commented Nov 29, 2020 •

edited

pduchnovsky commented Mar 3, 2021 •

edited

ermik commented Aug 27, 2021 •

edited

yellowmegaman commented Feb 1, 2022 •

edited

yellowmegaman commented Feb 3, 2022 •

edited

DanielHabenicht commented Feb 4, 2022 •

edited

mrwonko commented Mar 25, 2022 •

edited

danielllek commented Nov 24, 2022 •

edited

chamini2 commented Mar 14, 2023 •

edited

chamini2 commented Mar 14, 2023 •

edited