Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate value for attribute "guest_accelerator": incorrect list element type: attribute "gpu_sharing_config" is required. #12817

Closed
joe-a-t opened this issue Oct 17, 2022 · 12 comments

Comments

@joe-a-t
Copy link

joe-a-t commented Oct 17, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.3.2

  • provider registry.terraform.io/hashicorp/google-beta v4.41.0

Affected Resource(s)

  • google_container_node_pool

Terraform Configuration Files

resource "google_service_account" "default" {
  account_id   = "service-account-id"
  display_name = "Service Account"
}

resource "google_container_cluster" "primary" {
  name     = "my-gke-cluster"
  location = "us-central1"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
  provider = google-beta

  name       = "my-node-pool"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    guest_accelerator = []
  }
}

Debug Output

Panic Output

Expected Behavior

per #12733 saying it optional and the lack of release notes flagging a breaking change, my config that worked with 4.40.0 should have worked with 4.41.0

Actual Behavior

│ Error: Incorrect attribute value type
│ 
│   on .terraform/modules/foo/main.tf line 65, in resource "google_container_node_pool" "main":
│   65:     guest_accelerator = var.guest_accelerators
│     ├────────────────
│     │ var.guest_accelerators is a list of object
│ 
│ Inappropriate value for attribute "guest_accelerator": incorrect list
│ element type: attribute "gpu_sharing_config" is required.

Steps to Reproduce

  1. pin the google-beta provider to 4.40.0 and create the cluster with terraform apply
  2. change the google-beta provider to 4.41.0 and try to plan

Important Factoids

This issue does not appear on plans if the cluster does not exist yet. Unclear what happens if the cluster was created with 4.41.0, it's a bit harder for me to create a new cluster to test with but we are seeing this issue with our clusters that were created with pre- 4.41.0 versions of the provider (they were probably originally created with a 3.x.x version). Pinning the provider version back to 4.40.0 makes the error go away.

References

  • #0000
@joe-a-t joe-a-t added the bug label Oct 17, 2022
@bharathkkb
Copy link

We have also seen our tests starting to fail with 4.41.0.

@rileykarson
Copy link
Collaborator

This used ConfigMode: schema.SchemaConfigModeAttr,, an advanced setting with some negative side effects. Not sure why as it wasn't mentioned in the review at all- fixing this is probably just a matter of removing that.

@rileykarson
Copy link
Collaborator

Interesting, I don't seem to be able to reproduce this. 4.41.0 of the provider, I've tried Terraform 1.1.7 and 1.3.2:

$ terraform -v
Terraform v1.3.2
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v4.41.0
+ provider registry.terraform.io/hashicorp/google-beta v4.41.0
$ terraform apply
google_container_cluster.primary: Refreshing state... [id=projects/graphite-test-rileykarson/locations/us-central1/clusters/my-gke-cluster5435]
google_container_node_pool.primary_preemptible_nodes: Refreshing state... [id=projects/graphite-test-rileykarson/locations/us-central1/clusters/my-gke-cluster5435/nodePools/my-node-pool]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_node_pool.primary_preemptible_nodes will be created
  + resource "google_container_node_pool" "primary_preemptible_nodes" {
      + cluster                     = "projects/graphite-test-rileykarson/locations/us-central1/clusters/my-gke-cluster5435"
...

@joe-a-t
Copy link
Author

joe-a-t commented Oct 18, 2022

What if you create the node pool with v4.40.0 or v3.x of the provider before upgrading to v4.41.0? I wonder if it's something that has changed around the schema of guest_accelerator around that time since it currently looks like it should be a block according to the doc but the valid values that we have with v4.40.0 is having guest_accelerator be an = value instead of a block

@joe-a-t
Copy link
Author

joe-a-t commented Oct 18, 2022

@rileykarson
Copy link
Collaborator

I've tried some more and can't reproduce the exact error. @bharathkkb is receiving a different error, one that I was able to reproduce with the following:

provider "google-beta" {
  version = "~> 4.41.0"
}

resource "google_container_cluster" "primary" {
  name     = "my-gke-cluster5435"
  location = "us-central1"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
  provider = google-beta

  name           = "my-node-pool"
  cluster        = google_container_cluster.primary.id
  node_count     = 1
  node_locations = ["us-central1-f"]

  node_config {
    preemptible  = true
    machine_type = "n1-standard-2"

    guest_accelerator = [
      {
        type  = "nvidia-tesla-t4"
        count = 1
      },
    ]
  }
}
│ Error: Incorrect attribute value type
│
│   on main.tf line 38, in resource "google_container_node_pool" "primary_preemptible_nodes":
│   38:     guest_accelerator = [
│   39:       {
│   40:         type  = "nvidia-tesla-t4"
│   41:         count = 1
│   42:       },
│   43:     ]
│
│ Inappropriate value for attribute "guest_accelerator": element 0: attributes "gpu_partition_size" and "gpu_sharing_config" are required.

That error's somewhat expected due to https://www.terraform.io/language/attr-as-blocks#arbitrary-expressions-with-argument-syntax (and you can see an older field triggering it too, due to my config not including it).

@rileykarson
Copy link
Collaborator

@joe-a-t: I noticed the repro config and error message provided don't match up. Can you confirm that the repro and your config match exactly w/ respect to things like variables and the guest_accelerator block structure? Minute differences are likely to be the cause of the error here.

@rileykarson
Copy link
Collaborator

Alright, turns out Terraform forces the schema.ConfigModeAttr setting on us. For @bharathkkb and the error I introduced, that one's (unfortunately) expected and not related to the schema.ConfigModeAttr setting on the new field- any new field does that.

Filed #12824 to remove the setting where possible in 5.0.0. Closing this as wontfix/dupe, I'll mark a known issue on 4.41.0 for the part I've been able to repro.

@rileykarson rileykarson closed this as not planned Won't fix, can't repro, duplicate, stale Oct 18, 2022
@joe-a-t
Copy link
Author

joe-a-t commented Oct 18, 2022

It's unfortunately not exactly the same, we are using some nested modules here so was trying to provide something more sanitized and simpler to share publicly. The guest_accelerator = [] line is correct though. I can have our Google TAM reach out to schedule a time to debug more privately if that would be an option?

@rileykarson
Copy link
Collaborator

I've updated the changelogs and GH releases to note the known issue.

I can have our Google TAM reach out to schedule a time to debug more privately if that would be an option?

I'm not sure that there's much we can do about this field, unfortunately, even with more information- it's an upstream Terraform issue poking its head, and #12824 is the only real path to resolving it, which must be done in a major version.

I know why @bharathkkb and I are getting the error we're getting, but I'm not confident there's anyone at Google with deep enough knowledge of how Core/the provider SDK interact here to figure out your issue. Given you've just got guest_accelerator = [] this shouldn't be happening, unless that was in tfjson or the CDK or something.

@joe-a-t
Copy link
Author

joe-a-t commented Oct 19, 2022

We also had to do terraform-google-modules/terraform-google-kubernetes-engine#1428 and switch it to a dynamic block instead of using the = but are fixed now after making that change

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants