Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues creating a google_compute_security_policy with rules created in dynamic blocks #14178

Closed
abby-embree-airship opened this issue Apr 3, 2023 · 25 comments
Assignees
Labels

Comments

@abby-embree-airship
Copy link

abby-embree-airship commented Apr 3, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.4.0
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.59.0
+ provider registry.terraform.io/hashicorp/google-beta v4.59.0

Affected Resource(s)

  • google_compute_security_policy

Terraform Configuration Files

creating a security policy with rules generated via dynamic block does not work as expected; in some cases (i think relating to importing an existing rule then trying to change it vs creating a new one via a terraform apply) i get an error, other times the terraform plan simply shows no changes even when the list the dynamic block is iterating over via for_each changes - create the resource, add an entry to the list used to create rule blocks dynamically, then run a new plan, and you'll see no changes even though you should get a plan that includes adding the new rule block. example tf code:

      resource "google_compute_security_policy" "this" {                                                                                     
        name  = "test"                                                                                                          
                                                                                                                                                  
        dynamic "rule" {                                                                                                                          
          for_each = local.block                                                                                                    
          iterator = block                                                                                                                        
                                                                                                                                                  
          content {                                                                                                                               
            action   = "deny(502)"                                                                                                                
            priority = sum([block.key, 2])                                                                                                        
            match {                                                                                                                               
              versioned_expr = "SRC_IPS_V1"                                                                                                       
              config {                                                                                                                            
                src_ip_ranges = block.value                                                                                             
              }                                                                                                                                   
            }                                                                                                                                     
            description = "test"                                                      
          }                                                                                                                                       
        }                                                                                                                                         
                                                                                                                                                  
        rule {                                                                                                                                    
          action   = "allow"                                                                                                                      
          priority = "2147483647"                                                                                                                 
          match {                                                                                                                                 
            versioned_expr = "SRC_IPS_V1"                                                                                                         
            config {                                                                                                                              
              src_ip_ranges = ["*"]                                                                                                               
            }                                                                                                                                     
          }                                                                                                                                       
          description = "default allow"                                                                                                           
        }                                                                                                                                         
      }

Error Output

the errors i encountered trying to change the number of rules on an imported resource:

╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.temporary-cloudarmor-rules.google_compute_security_policy.temporary to include new values learned so far
│ during apply, provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .rule: planned set element
│ cty.ObjectVal(map[string]cty.Value{"action":cty.StringVal("allow"), "description":cty.StringVal("Default rule, higher priority overrides it"),
│ "header_action":cty.ListValEmpty(cty.Object(map[string]cty.Type{"request_headers_to_adds":cty.List(cty.Object(map[string]cty.Type{"header_name":cty.String,
│ "header_value":cty.String}))})),
│ "match":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"config":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"src_ip_ranges":cty.SetVal([]cty.Value{cty.StringVal("*")})})}),
│ "expr":cty.ListValEmpty(cty.Object(map[string]cty.Type{"expression":cty.String})), "versioned_expr":cty.StringVal("SRC_IPS_V1")})}),
│ "preview":cty.False, "priority":cty.NumberIntVal(2.147483647e+09),
│ "rate_limit_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"ban_duration_sec":cty.Number,
│ "ban_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number})), "conform_action":cty.String,
│ "enforce_on_key":cty.String, "enforce_on_key_name":cty.String, "exceed_action":cty.String,
│ "exceed_redirect_options":cty.List(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String})),
│ "rate_limit_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number}))})),
│ "redirect_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String}))}) does not correlate with any
│ element in actual.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.temporary-cloudarmor-rules.google_compute_security_policy.temporary to include new values learned so far
│ during apply, provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .rule: planned set element
│ cty.ObjectVal(map[string]cty.Value{"action":cty.StringVal("deny(403)"), "description":cty.StringVal(""),
│ "header_action":cty.ListValEmpty(cty.Object(map[string]cty.Type{"request_headers_to_adds":cty.List(cty.Object(map[string]cty.Type{"header_name":cty.String,
│ "header_value":cty.String}))})),
│ "match":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"config":cty.ListValEmpty(cty.Object(map[string]cty.Type{"src_ip_ranges":cty.Set(cty.String)})),
│ "expr":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"expression":cty.StringVal("has(request.headers['user-agent']) &&
│ request.headers['user-agent'].contains('Nuclei') \n")})}), "versioned_expr":cty.StringVal("")})}), "preview":cty.False,
│ "priority":cty.NumberIntVal(1), "rate_limit_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"ban_duration_sec":cty.Number,
│ "ban_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number})), "conform_action":cty.String,
│ "enforce_on_key":cty.String, "enforce_on_key_name":cty.String, "exceed_action":cty.String,
│ "exceed_redirect_options":cty.List(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String})),
│ "rate_limit_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number}))})),
│ "redirect_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String}))}) does not correlate with any
│ element in actual.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.temporary-cloudarmor-rules.google_compute_security_policy.temporary to include new values learned so far
│ during apply, provider "registry.terraform.io/hashicorp/google" produced an invalid new value for .rule: planned set element
│ cty.ObjectVal(map[string]cty.Value{"action":cty.StringVal("deny(403)"), "description":cty.StringVal(""),
│ "header_action":cty.ListValEmpty(cty.Object(map[string]cty.Type{"request_headers_to_adds":cty.List(cty.Object(map[string]cty.Type{"header_name":cty.String,
│ "header_value":cty.String}))})),
│ "match":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"config":cty.ListValEmpty(cty.Object(map[string]cty.Type{"src_ip_ranges":cty.Set(cty.String)})),
│ "expr":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"expression":cty.StringVal("has(request.headers['user-agent']) &&
│ request.headers['user-agent'].contains('nuclei')")})}), "versioned_expr":cty.StringVal("")})}), "preview":cty.False,
│ "priority":cty.NumberIntVal(2), "rate_limit_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"ban_duration_sec":cty.Number,
│ "ban_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number})), "conform_action":cty.String,
│ "enforce_on_key":cty.String, "enforce_on_key_name":cty.String, "exceed_action":cty.String,
│ "exceed_redirect_options":cty.List(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String})),
│ "rate_limit_threshold":cty.List(cty.Object(map[string]cty.Type{"count":cty.Number, "interval_sec":cty.Number}))})),
│ "redirect_options":cty.ListValEmpty(cty.Object(map[string]cty.Type{"target":cty.String, "type":cty.String}))}) does not correlate with any
│ element in actual.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

Expected Behavior

changing the content of the list being for_each'ed in the dynamic block should result in planned changes to the resource

Actual Behavior

either nothing happens (erroneously empty plan), or an error like the one above occurs

@edwardmedia edwardmedia self-assigned this Apr 3, 2023
@edwardmedia
Copy link
Contributor

@abby-embree-airship dynamic is a Terraform feature and I can't really commit much on that. But before you apply any content changes, did you see any diffs in the plan after the import? If yes, you probably need to fix that first before moving to the next step. Maybe to start without dynamic?

@abby-embree-airship
Copy link
Author

@edwardmedia importing an existing resource then running a terraform plan with a rule created via a dynamic block produced the provider errors I included in the original post

@edwardmedia
Copy link
Contributor

@abby-embree-airship before you adding the dynamic code, do you see an issue with its plan?

@abby-embree-airship
Copy link
Author

no, but that doesn't mean that the error is indicative of an issue with dynamic blocks rather than the provider - see hashicorp/terraform#28340

@edwardmedia
Copy link
Contributor

edwardmedia commented Apr 3, 2023

@abby-embree-airship using below config, I tested apply -> import -> apply on the imported config. No issue is found. What should I do in order for me to repro the issue?

From the link you provided, it seems a discussion in the terraform core, not the provider itself.

resource "google_compute_security_policy" "policy" {

  name = "my-policy"

  rule {
    action   = "deny(403)"
    priority = "1000"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["9.9.9.0/24"]
      }
    }
    description = "Deny access to IPs in 9.9.9.0/24"
  }

  rule {
    action   = "allow"
    priority = "2147483647"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "default rule"
  }
}

@abby-embree-airship
Copy link
Author

the config you posted won't reproduce the issue because it doesn't create rules via dynamic blocks. the issue i linked indicates that problems with creating dynamic blocks within certain resource types may be indicative of a provider-level issue rather than an hcl-level issue because of how different resource types handle dynamic blocks. I'm working on a complete example to demonstrate this issue, will post when finished

@edwardmedia
Copy link
Contributor

@rileykarson what do you think about this issue?

@edwardmedia
Copy link
Contributor

edwardmedia commented Apr 5, 2023

@abby-embree-airship after import, I am able to use below config with a dynamic block to map its state. Do you see anything wrong?

locals {
    rules = [
        {action = "deny(403)", priority = "1000", versioned_expr = "SRC_IPS_V1", src_ip_ranges = ["9.9.9.0/24"], description = "Deny access to IPs in 9.9.9.0/24"},
        { action   = "allow", priority = "2147483647", versioned_expr = "SRC_IPS_V1", src_ip_ranges = ["*"], description = "default rule"}
    ]
}

resource "google_compute_security_policy" "policy" {
    name = "issue14178-dynamic"

    dynamic "rule" {
        for_each = local.rules
        content {

            action   = rule.value.action
            priority = rule.value.priority
            match {
                versioned_expr = rule.value.versioned_expr
                config {
                    src_ip_ranges = rule.value.src_ip_ranges
                }
            }
            description = rule.value.description

        }
    }
}

@abby-embree-airship
Copy link
Author

abby-embree-airship commented Apr 10, 2023

@edwardmedia here's some example code that demonstrates this issue - a less complicated conditional in constructing valid_blocks won't reproduce the problem:

locals {
  blocklist = [
    {
      ip          = "192.0.0.8"
      description = "test 1"
      block_until = "1970-01-01T00:00:00Z"
      # block_until = "2024-01-01T00:00:00Z"
    },
    {
      ip          = "8.8.8.8"
      description = "test 2"
      # block_until = "1970-01-01T00:00:00Z"
      block_until = "2024-01-01T00:00:00Z"
    }
  ]
  valid_blocks = [
    for block in local.blocklist : {
      ip          = block.ip
      description = block.description
    }
    if timecmp(timestamp(), block.block_until) == -1
  ]
}

resource "google_compute_security_policy" "this" {
  name  = "test"

  dynamic "rule" {
    for_each = local.valid_blocks
    iterator = block

    content {
      action   = "deny(502)"
      priority = block.key
      match {
        versioned_expr = "SRC_IPS_V1"
        config {
          src_ip_ranges = [block.value.ip]
        }
      }
      description = block.value.description
    }
  }

  rule {
    action   = "allow"
    priority = "2147483647"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "default allow"
  }
}

do the following:

  • apply the above terraform code
  • check the cloudarmor rule created in GCP, you should see the rule with 1 entry with a description like "test 2"
  • swap the dates by commenting lines 6 and 13 and uncommenting lines 7 and 12
  • run a terraform plan - this will show no planned changes even though the contents of local.valid_blocks has just changed
  • destroy the resource
  • terraform apply again
  • check the cloudarmor rule in GCP again, description of the rule should now be "test 1"

I think this has something to do with the unusually vague plan output seen below:

  # google_compute_security_policy.this will be created
  + resource "google_compute_security_policy" "this" {
      + fingerprint = (known after apply)
      + id          = (known after apply)
      + name        = "test"
      + project     = (known after apply)
      + self_link   = (known after apply)
      + type        = (known after apply)
    }

again, I think the root of this vagueness has to do with the problem described in hashicorp/terraform#28340 . also, I haven't tested a function call other than timecmp in the for loop creating local.valid_blocks, but i know that having a boolean compare instead of comparing the output of a function doesn't reproduce the issue. the above code below for example works fine:

  blocklist = [
	{
	  ip           = "192.0.0.8"
	  should_block = true
	},
	{
	  ip           = "8.8.8.8"
	  should_block = false
	}
  ]
  valid_blocks = [
	for block in local.blocklist : block.ip
	if block.should_block
  ]

@abby-embree-airship
Copy link
Author

as for the provider error i posted, i don't remember the exact circumstances that produced that error, but i will work on making it happen again. it's related to this issue somehow, but is tangential to the main problem addressed above

@edwardmedia
Copy link
Contributor

@abby-embree-airship tried the config from here, I am receiving below error. Does it work for you?

╷
│ Error: Call to unknown function
│ 
│   on main.tf line 21, in locals:
│   21:     if timecmp(timestamp(), block.block_until) == -1
│     ├────────────────
│     │ block.block_until will be known only after apply
│ 
│ There is no function named "timecmp".

@abby-embree-airship
Copy link
Author

@edwardmedia update your terraform version, the timecmp function only exists in terraform 1.3 and later. I used 1.4.0 when writing up this issue. see https://developer.hashicorp.com/terraform/language/functions/timecmp

@edwardmedia
Copy link
Contributor

edwardmedia commented Apr 17, 2023

@abby-embree-airship I realized the version. After updating it to 1.4.5, it works now.

Mind reminding me again about the issue? Applying the config works. After that, plan does not show a diff. What do you want me to look?

@abby-embree-airship
Copy link
Author

try making the change i mentioned - switch the block_until dates, then re-run the plan. the expected results would be that there would be changes in the plan, the fact that there aren't is a bug.

@edwardmedia
Copy link
Contributor

edwardmedia commented Apr 17, 2023

@abby-embree-airship I see what you meant now. But if we ignore valid_blocks and use the blocklist directly, any changes in the locals like description trigger a diff in the plan which seems correct. To me, this is still a question or issue in the terraform core. I do not see a problem in the provider. @SarahFrench what do you think?

locals {
  blocklist = [
    {
      ip          = "192.0.0.8"
      description = "test 1"
      # block_until = "1970-01-01T00:00:00Z"
      block_until = "2024-01-01T00:00:00Z"
    },
    {
      ip          = "8.8.8.8"
      description = "test 2"
      block_until = "1970-01-01T00:00:00Z"
      # block_until = "2024-01-01T00:00:00Z"
    }
  ]
  valid_blocks = [
    for block in local.blocklist : {
      ip          = block.ip
      description = block.description
    }
    if timecmp(timestamp(), block.block_until) == -1
  ]
}

resource "google_compute_security_policy" "this" {
  name  = "issue14178-user"

  dynamic "rule" {
    //for_each = local.valid_blocks
    for_each = local.blocklist. <------ loop blocklist directly
    iterator = block

    content {
      action   = "deny(502)"
      priority = block.key
      match {
        versioned_expr = "SRC_IPS_V1"
        config {
          src_ip_ranges = [block.value.ip]
        }
      }
      description = block.value.description
    }
  }

  rule {
    action   = "allow"
    priority = "2147483647"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "default allow"
  }
}

@abby-embree-airship
Copy link
Author

i believe this to be a provider issue because of what's discussed in hashicorp/terraform#28340, but also because we can test whether this behavior is replicable across different resource types. consider this resource:

resource "null_resource" "this" {
  triggers = {
    test = jsonencode(local.valid_blocks)
  }
}

when i create it with the blocklist above, then switch around the commented lines and run a plan, i get this:

  # null_resource.this must be replaced
-/+ resource "null_resource" "this" {
      ~ id       = "8952275810646135394" -> (known after apply)
      ~ triggers = { # forces replacement
          ~ "test" = jsonencode(
                [
                  - {
                      - description = "test 2"
                      - ip          = "8.8.8.8"
                    },
                ]
            ) -> (known after apply)
        }
    }

this tells me the problem isn't with the timecmp function or with the code that's creating valid_blocks. I'm still working on finding a resource to use as a test where i can create dynamic blocks that will accept content from valid_blocks, will post an update once i have found something, but i expect it will behave correctly for other resource types and not for this specific resource type.

@abby-embree-airship
Copy link
Author

abby-embree-airship commented Apr 17, 2023

@edwardmedia okay, here's an example of this exact approach working as expected with a different resource type. hopefully this should demonstrate that this issue is specific to this provider and resource type rather than an issue with core terraform:

resource "google_sql_database_instance" "db" {
  name             = "test-deleteme"
  database_version = "POSTGRES_11"
  region           = "us-east1"

  settings {
    tier = "db-f1-micro"

    ip_configuration {
      ipv4_enabled = true
      require_ssl  = true

      dynamic "authorized_networks" {
        for_each = local.valid_blocks
        iterator = block

        content {
          name  = block.value.description
          value = "${block.value.ip}/32"
        }
      }
    }
  }
}

apply this, then switch the commented out dates, then plan - you will see proposed changes. apply it and you'll see that the authorized networks for the db changes from 192.0.0.8 to 8.8.8.8. this replicates the same dynamic block functionality as i used for the security policy, but with a different resource type

@abby-embree-airship
Copy link
Author

also, i just tested this with a condition other than testing the output of timecmp - if length(regexall("/24/", block.block_until)) > 0 replicates this same behavior

@SarahFrench
Copy link
Member

SarahFrench commented Apr 18, 2023

Hi @abby-embree-airship 👋 - thanks for your reproduction steps in your previous comment! It helped me familiar with this issue quickly.

Problem

I think I've found the underlying cause of the issue - the timestamp() built-in function in Terraform core. This function is not called during plan steps and is only called during apply steps. For more info please see this, quite hard to spot, note in the documentation.

You can explore this by altering the locals block in your reproduction instructions to use this new local.current_time value and un/comment different values.

locals {
  blocklist = [
    {
      ip          = "192.0.0.8"
      description = "test 1"
      block_until = "2024-01-01T00:00:00Z"
    },
    {
      ip          = "8.8.8.8"
      description = "test 2"
      block_until = "1970-01-01T00:00:00Z"
    }
  ]
  current_time = timestamp() # Use when initially applying the config
  # current_time = "2023-04-18T00:00:00Z" # 1 in valid_blocks - Uncomment when playing with plans after initial apply
  # current_time = "2050-01-01T00:00:00Z" # 0 in valid_blocks - Uncomment when playing with plans after initial apply
  # current_time = "1950-01-01T00:00:00Z" # 2 in valid_blocks - Uncomment when playing with plans after initial apply
  valid_blocks = [
    for block in local.blocklist : {
      ip          = block.ip
      description = block.description
    }
    if timecmp(local.current_time, block.block_until) == -1
  ]
}

In plan steps (following the initial apply to create the google_compute_security_policy resource) I believe Terraform is unable to detect that a change is needed because it doesn't have access to timestamp()'s value, therefore doesn't know what local.valid_blocks's value is, so doesn't know how to make a diff between old and new values of local.valid_blocks.


Ideas for next steps

I can see your use case is depending on Terraform to generate a plan to update the google_compute_security_policy resource whenever a block_until time swaps from being in the future to being in the past. Currently the timestamp() issue blocks this completely I'm afraid 😬

First, I think it'd be useful if you opened an issue in the core Terraform repo as I think your use case is totally valid.

Edit: There is currently a PR in Terraform core to add a specific plan time timestamp! : hashicorp/terraform#32980

In terms of helping you achieve a policy that updates correctly as time passes, one crude option if you are using Terraform OSS is to run CLI commands with a -replace flag (see here in docs). This is not the ideal solution, but if you force Terraform to replace the resource it'll reassess the contents of local.valid_blocks and recreate the policy in a way that respects the current time.


Phew, sorry for writing so much! Please let me know if you have any further questions - I'm happy to discuss this more if needed, but I think we'll eventually close this issue as the problem lies with TF core and not the Google provider

@abby-embree-airship
Copy link
Author

I tried to work around this by using replace_triggered_by and a null resource trigger to force-recreate the resource every time the current_blocks list changes, but that's not really practical for this use case since other resources depend on this cloudarmor resource. using a templating tool or scripting to generate the dates before terraform executes is another option to work around this.

there are a couple things that i think remain unaddressed:

first, this explanation would make sense to me if it only affected the results of a terraform plan, but running a terraform apply also produces no changes - the resource has to be destroyed and recreated to reflect the change made in the local variable, which isn't consistent with the plan vs apply time behavior discussed in the documentation provided. also, the documentation says that a plan that is dependent on the output of that function should get an unknown value that is reflected in the plan, rather than no changes.

second, the example i provided in this comment definitively shows that the exact same steps to reproduce, when tried against a different resource type with the same dynamic block pattern, do not result in the behavior demonstrated in the original post. to me this indicates that it's got something to do with that specific resource type, and not a problem with terraform core

@abby-embree-airship
Copy link
Author

abby-embree-airship commented Apr 19, 2023

so i was looking through the code a little and it seems like the root of the issue may be this check, which compares the old and new key values provided by https://pkg.go.dev/github.com/hashicorp/terraform/helper/schema#ResourceData.GetChange - a call that happens a few lines above at https://github.com/hashicorp/terraform-provider-google/blob/main/google/resource_compute_security_policy.go#L631 . the issue with doing this comparison is that i believe that oSet and nSet might not contain anything at all, or could be in a state that does not reflect the actual state of the proposed delta, due to something mentioned in the link you posted previously, which indicates that the use of these functions produces a 'known after apply' state AKA an unknown value. this is mentioned in other parts of the documentation as being problematic; customizeDiff, for example, doesn't support known after apply values: 'NOTE: CustomizeDiff does not currently support computed/"known after apply" values from other resource attributes.' (taken from https://developer.hashicorp.com/terraform/plugin/sdkv2/resources/customizing-differences). i found this issue hashicorp/terraform-plugin-sdk#1040 discussing how an unknown value coming from something like our dynamic block might behave that has the line 'The map value doesn't include the "Unknown" element at all' in it, and i'm definitely not an expert in the internal workings of terraform and don't actually know go at all but i think based on all this information that something like this might be happening in the code (taken from resourceComputeSecurityPolicyUpdate https://github.com/hashicorp/terraform-provider-google/blob/main/google/resource_compute_security_policy.go#L572 comments added by me):

func resourceComputeSecurityPolicyUpdate(d *schema.ResourceData, meta interface{}) error {
	config := meta.(*Config)

        //...

	if d.HasChange("rule") {
		o, n := d.GetChange("rule") // getchange returns a map containing no rules due to SDK defect 1040
		oSet := o.(*schema.Set)
		nSet := n.(*schema.Set)

		oPriorities := map[int64]bool{}
		nPriorities := map[int64]bool{}
		for _, rule := range oSet.List() {
			oPriorities[int64(rule.(map[string]interface{})["priority"].(int))] = true
		}

               // i think if you logged oPriorities and nPriorities here you would maybe see something interesting, including maybe two empty maps

		for _, rule := range nSet.List() { // no iterations of this for loop run
			priority := int64(rule.(map[string]interface{})["priority"].(int))
			nPriorities[priority] = true
			if !oPriorities[priority] {
				client := config.NewComputeClient(userAgent)

				op, err := client.SecurityPolicies.AddRule(project, sp, expandSecurityPolicyRule(rule)).Do() // this is why this behavior doesn't result in existing rules in the remote object being deleted, because rule changes are additive instead of building a list of desired rules, wiping out all rules in the remote object, then pushing a new set of rules determined in the previous step

				if err != nil {
					return errwrap.Wrapf(fmt.Sprintf("Error updating SecurityPolicy %q: {{err}}", sp), err)
				}

				err = ComputeOperationWaitTime(config, op, project, fmt.Sprintf("Updating SecurityPolicy %q", sp), userAgent, d.Timeout(schema.TimeoutUpdate))
				if err != nil {
					return err
				}
			} else if !oSet.Contains(rule) {
				// ...
			}
		}

		for _, rule := range oSet.List() { // doesn't trigger if oSet is empty
                        // ...
		}
	}

	return resourceComputeSecurityPolicyRead(d, meta)
}

i'm not super confident about all this, but a quick way to check whether i'm on to something or not would be to log the contents of oSet and nSet and see if there is anything strange going on there. like i said i don't know go or any provider development stuff so i can't really test it myself. if it turns out that i'm correct about the source of this issue, then the solution would be to migrate to V2 of the plugin SDK mentioned here. that page's subheading named ''null-unknown-and-known-values" doesn't actually exist anywhere in the doc, i assume they forgot to add it, but i'm guessing that the fact that it was supposed to be included means that the V2 SDK at least includes hints in the schema data model that indicate when there are null or unknown values so that they can be handled properly or at least more gracefully than the V1 model where those values are just silently absent.

@edwardmedia @SarahFrench

@SarahFrench
Copy link
Member

Hey, thanks for looking into this. I was doing some similar hunting through the code for a smoking gun yesterday and I found something that might interest you.

First I want to acknowledge some of the points you raised above:

so i was looking through the code a little and it seems like the root of the issue may be this check

Unfortunately the Update/Create/Delete functions are only invoked after a plan has been made identifying that those actions need to be taken on a resource. During the plan stage only Read is invoked, when the state is refreshed. The things relevant to whether a plan is made or not are Terraform core, the SDK, the schema of the resource, and the Read function.

solution would be to migrate to V2 of the plugin SDK mentioned here.

There is currently a project starting to migrate this provider to the plugin framework and there are lot of benefits I'm looking forward to! The SDK is known to have a lot of quirks that the DevEx team has learned from when making this framework.


Now, thinking about the different behaviours of dynamic blocks in google_compute_security_policy versus google_sql_database_instance. I found that the reason why we can generate plans for the DB instance while using cynamic blocks is that the authorized_networks field does not contain Computed: true. The rules block in the security policy resource does have Computed: true in the schema and this prevents the a diff being identified and a plan being generated. I found that if I built the Google provider locally with the resource_compute_security_policy.go#L67 line removed then I could get the google_compute_security_policy resource to behave like google_sql_database_instance when generating plans for dynamic blocks. Similarly, if I made the authorized_networks field Computed: true I can make it no longer able to make plans for that block.

The reason for this is a quirk of the SDK - it can conflate Computed and Unknown values. I agree that a 'more correct' behaviour for the google_compute_security_policy resource would for the plan to be generated each time. However we can't make the rule block not Computed as that would be a breaking change for other users. There doesn't seem to be a correct way to make the resource that caters for everyone; the SDK's problems can block certain things being done.

So the ways to solve this issue seem to be:

  • Somehow make the current timestamp a known value, instead of unknown
    • plantimestamp will help here. The docs in that PR say it'll be released in Terraform 1.5
  • Resolve the SDK issues by migrating to the plugin framework.
    • Migration work is currently underway - the provider currently combines the framework and the old SDK
    • I'm afraid I don't have a timeline I can share with you for when this resource will be migrated

I appreciate this is annoying, but I don't think there are any bug fixes that can be made quickly to address this issue.


One last thing, a possible way to "make the current timestamp a known value" without waiting on a HashiCorp release is to use a provider that returns the current time as a data source. Data sources are refreshed during plan generation (assuming there are no dependencies with other resources) and are known values. I have toyed with making one on my laptop and I could share it with you if you wish?

@abby-embree-airship
Copy link
Author

One last thing, a possible way to "make the current timestamp a known value" without waiting on a HashiCorp release is to use a provider that returns the current time as a data source. Data sources are refreshed during plan generation (assuming there are no dependencies with other resources) and are known values. I have toyed with making one on my laptop and I could share it with you if you wish?

i like this workaround a lot; it doesn't require an external templating tool, and is trivial to set up. the following gets me a nice-looking plan with no known-after-apply values, and catches changes to blocklist:

locals {
  # ...
  valid_blocks = [
    for block in local.blocklist : {
      ip          = block.ip
      description = block.description
    }
    if timecmp(data.external.date.result["date"], block.block_until) == -1
  ]
}

data "external" "date" {
  program = ["bash", "${path.module}/date.sh"]
}

and this in date.sh:

#!/bin/bash
cat << EOF
{
    "date": "$(date +%FT%TZ)"
}
EOF
exit 0

this is a perfectly usable solution until the function you mentioned comes out in 1.5. thanks a ton for your help on this issue @SarahFrench

@SarahFrench
Copy link
Member

Ah the bash script is much simpler than a whole provider, nice! I'm glad we could get your use case working 😄

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants