Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabled parameter to avoid logical statements in count #21953

Open
romachalm opened this issue Jul 2, 2019 · 147 comments
Open

enabled parameter to avoid logical statements in count #21953

romachalm opened this issue Jul 2, 2019 · 147 comments

Comments

@romachalm
Copy link

Terraform v0.12.2

Use-cases

We often use count to disable a resource (above all in modules) from a boolean var, with statements like count = var.enabled ? 1 : 0 or count = var.enabled ? 1 : length([some list of resources or datasources])

Here are among others some code snippets from terraform-aws-vpc module :

resource "aws_subnet" "private" {
  count = var.create_vpc && length(var.private_subnets) > 0 ? length(var.private_subnets) : 0

  vpc_id            = local.vpc_id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = element(var.azs, count.index)

  tags = merge(
    {
      "Name" = format(
        "%s-${var.private_subnet_suffix}-%s",
        var.name,
        element(var.azs, count.index),
      )
    },
    var.tags,
    var.private_subnet_tags,
  )
}

resource "aws_network_acl" "public" {
  count = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 ? 1 : 0

  vpc_id     = element(concat(aws_vpc.this.*.id, [""]), 0)
  subnet_ids = aws_subnet.public.*.id

  tags = merge(
    {
      "Name" = format("%s-${var.public_subnet_suffix}", var.name)
    },
    var.tags,
    var.public_acl_tags,
  )
}

resource "aws_network_acl_rule" "public_inbound" {
  count = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 ? length(var.public_inbound_acl_rules) : 0

  network_acl_id = aws_network_acl.public[0].id

  egress      = false
  rule_number = var.public_inbound_acl_rules[count.index]["rule_number"]
  rule_action = var.public_inbound_acl_rules[count.index]["rule_action"]
  from_port   = var.public_inbound_acl_rules[count.index]["from_port"]
  to_port     = var.public_inbound_acl_rules[count.index]["to_port"]
  protocol    = var.public_inbound_acl_rules[count.index]["protocol"]
  cidr_block  = var.public_inbound_acl_rules[count.index]["cidr_block"]
}

I have also witnessed some int variable used as count = var.enabled * length([some list of resources or datasources]).

It makes the triggers more complex to understand, as count actually embeds both a boolean expression to enable the provisionning, together with the number of instances of the resource. It sounds like a twist of the initial expected count usage.

Attempted Solutions

I have thought initially about having count accepting booleans by automatically converting true->1 and false->0.
I had a quick discussion with go-cty developer that explained to me the intentional decision not to convert bool to number to avoid confusions, and indeed, I am aligned that the code should stay readable by segregating the scopes.
Therefore, I attempt this proposal below.

Proposal

Introducing a new parameter enabled in resources accepting bool values or logical statements to provision the resource (default) or not, independently from count value (which can be 0 by the way).
It will increase the readibility of the code by avoiding sometines complex conditional operators ?:

The previous examples would become :

resource "aws_subnet" "private" {
  enabled = var.create_vpc
  count   = length(var.private_subnets)

  vpc_id            = local.vpc_id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = element(var.azs, count.index)

  tags = merge(
    {
      "Name" = format(
        "%s-${var.private_subnet_suffix}-%s",
        var.name,
        element(var.azs, count.index),
      )
    },
    var.tags,
    var.private_subnet_tags,
  )
}

resource "aws_network_acl" "public" {
  enabled = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0

  vpc_id     = element(concat(aws_vpc.this.*.id, [""]), 0)
  subnet_ids = aws_subnet.public.*.id

  tags = merge(
    {
      "Name" = format("%s-${var.public_subnet_suffix}", var.name)
    },
    var.tags,
    var.public_acl_tags,
  )
}

resource "aws_network_acl_rule" "public_inbound" {
  enabled = var.create_vpc && var.public_dedicated_network_acl && length(var.public_subnets) > 0 
  count   = length(var.public_inbound_acl_rules)

  network_acl_id = aws_network_acl.public[0].id

  egress      = false
  rule_number = var.public_inbound_acl_rules[count.index]["rule_number"]
  rule_action = var.public_inbound_acl_rules[count.index]["rule_action"]
  from_port   = var.public_inbound_acl_rules[count.index]["from_port"]
  to_port     = var.public_inbound_acl_rules[count.index]["to_port"]
  protocol    = var.public_inbound_acl_rules[count.index]["protocol"]
  cidr_block  = var.public_inbound_acl_rules[count.index]["cidr_block"]
}

Another use case I often see is to to provision some resources only for prod configurations (audit logs buckets, special routes or firewall rules):

enabled = var.env == "production" || var.env == "qa"

enabled shall preced the interpretation of count interpretation.

IMO, it will make the code more readable by explicitly describing what may prevent the provisionning of count resources. It will also allow to clearly distinguish bool from int variables.

References

I haven't found any ticket on that subject. Sorry if I have missed one.

@PCatinean
Copy link

This is in my opinion such a huge problem with what appears a "simple" solution. Relying on count to disable a module has a lot of bad effects from how you retrieve the output to having warnings about unsupported "count" from submodules that define providers themselves. What is the technical/functional obstacle of having a simple boolean switch on modules and/or resources?

One can neatly separate module calls by using terraform workspace name and much more, avoiding tons of nasty hacks that are filling up the entire web when searching for "terraform re-usable modules" or any similar dry concepts.

@Tbohunek
Copy link

@PCatinean The argument was that there are no clear use-cases. I fully support yours and @romachalm and here's mine:

I deploy Landing Zone for 200 Products from my landing-zone module, which actually consists of different modules such as network, key-vault, rbac, des and so on.
In some cases, deployment of a feature requires other inputs. e.g. to deploy disk encryption set des, I need first the Security team to deploy the HSM key into a vault that I deploy via key-vault of the landing-zone in the first place.
Yes, I can deploy the des module extra outside the landing-zone module, but this means to add 200 instances of des module call with many variables passed into. This is compared to a deploy_des variable flag.

However, when using count, the id like .[0] is added to the resource path in Terraform graph, leading to potential issues in case there sometimes appeared a .[1].
Moreover, turning a "hard-deploy" module into "counted" module means you need to at least terraform state mv all resources so that they contain the .[0] even though you never intent to deploy more than 1 instance of this resource. This may be costly and risky operation, and it may discourage teams from even trying to make their code more flexible.

What do you think of this use-case?

@jbardin
Copy link
Member

jbardin commented Mar 1, 2021

Adding additional context @schollii provided in #27936


[As recommended by @apparentlymart in https://github.com/hashicorp/hcl/issues/450, I'm moving this improvement request here.]

The notion of "nullness" or "existence" must be first class in a language, and the language should minimize how much "how" the user needs to express.

Currently in order to make resources and modules optional, we have to resort to using count = 0 or 1. Examples:

resource "provider_type" "name" {
  count = var.name_enabled ? 1 : 0
  ...
} 

The problem is that in current HCL the resource provider_type.name doesn't exist OR it is a list of one item, which is not only counter-intuitive (because a resource that exists is semantically very different from a list of 1 resource, and a resource that does not exist is fundamentally different from an empty list). Code which, prior to making the resource optional, was working, should have need modification to use list syntax instead. For example before making a resource optional,

resource "provider_type_1" "name1" {
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name
1.some_attribute ? "a" : "b"
}

Now make name1 optional, you have to change name2 to use list item 0:

resource "provider_type_1" "name1" {
  count = var.name1_enabled ? 1 : 0
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name1[0].some_attribute ? "a" : "b"
}

The name1[0] is actually anti-declarative: you are forcing the HCL code programmer to tell the HCL interpreter how to access the resource, whereas the fact that the resource is stored internally in a list of 1 item is an implementation detail that is of no concern to the user. Declarative implies to say what you want, and the program figures out how to do it. Here, it should be fine to write provider_type_1.name1.some_attribute ? "a" : "b" and HCL interpreter should know that obviously it is referring to the only resource provider_type_1.name1 that exists.

Similarly if I define an item as a list, it really is a list: a list of no items, 1 item, or N>1 items, all are lists and the code should reflect that. So if I write

resource "provider_type" "name" {
  count = var.number_of_name_items
  some_var = "abc"
} 

clearly the semantics of provider_type.name is that of a list, even if there is 1 item, or no items. And so all code that refers to provider_type.name should have to use list methods, even when there is only 1 or 0 item. This contrasts to the first example where count is being used as a boolean. Both uses are fundamentally different, and this should be reflected in the language.

Since enabling/disabling resources and modules is fundamentally different from creating and manipulating lists of items, it should have its own meta variable. I should be able to write

resource "provider_type_1" "name1" {
  enabled = var.name1_enabled
  some_var = "abc"
} 
resource "provider_type_2" "name2" {
  some_other_var = provider_type_1.name1.some_attribute ? "a" : "b"
}

In addition, the language should make it easy to handle the situation where code refers to a resource that has been disabled. It is probably reasonable for the above expression provider_type_1.name1.some_attribute ? "a" : "b" to fail if name1 disabled because for HCL interpreter, "if name1 exists then here is how you determine the value of provider_type_2.name2.some_other_var, but I (HCL) don't know what to do if provider_type_1.name1 does not exist". The failure is a good flag to raise to the user, so they have to think about what some_other_var should be if name1 does not exist. But in most cases, a reasonable default should be easy to write. One option would be to support a quarternary operator where the third option is used if any objects in the expression are nil. Eg

provider_type_1.name1.some_attribute ? "a" : "b" : "use_this_value_if_nil"

The third value (the 4th argument of the quaternary operator) would be used if the expression on the LHS (the 1st argument of quaternay operator) cannot be evaluated because some objects are nil. Syntax errors and calling an non-existent method on a non-nil object should still fail. An expression on LHS of ? like concat(a.b, c.d.e, f.g) would use the 3rd option if any of a, c, f, or c.d are nil.

The notion of "nullness" or "existence" should be first class in a language, and the language should minimize how much "how" the user needs to express.

@Tbohunek
Copy link

Tbohunek commented Mar 22, 2021

Thank you @jbardin for taking the time to write that up. The sheer length of the post forced me to also take some time to read it and understand it, and you summarized it very well.

How does it look now? Did the chances of this going through increase? It would help many Terraform users build better, nicer, cleaner and more flexible code, and adopt enabling "as they go" with no code modification effort (like adding [0] indices).

On the point of chained resources, I would say it doesn't require explicit handling. If parent resource is enabled=false, all anyhow dependent resources would also automatically enabled=false. E.g. network_security_group nsg is false, then all nsg_rules depending on nsg.id should also false.
If it required explicit handling like ? : :, it would only lead to provisioning failures painful to debug.

How does that sound?

@jbardin
Copy link
Member

jbardin commented Mar 22, 2021

Hi @Tbohunek, don't thank me, thank @schollii, who originally did that nice writeup as feature request in the hcl repository ;)

I agree it is a compelling feature, but there are many competing compelling features to consider for terraform, and we cannot prioritize them all. I cannot say when this might be considered, but it is not a minor feature and will take considerable planning to handle in a future version of terraform.

@Tbohunek
Copy link

Thanks @schollii ! :)
@jbardin if count could co-exist with this enabled, the change should be transparent to current users. If count would change then obviously the sooner the better. More users stuck with count makes such change harder up to a point where it won't matter anymore.

@nikolay
Copy link

nikolay commented Apr 10, 2021

@jbardin This looks like a minor and backward-compatible effort though. While waiting for the big features to be prioritized, we keep struggling daily and produce hard-to-maintain unreadable code!

@damon-atkins
Copy link

damon-atkins commented Aug 7, 2021

Consider Meta-Argument condition or include_if so people stop using count= 0 or 1 or for_each empty or single item
And content which allows passing a data structure instead of DSL

When choosing a Meta-Argument consider classes with existing resources.

locals {
   something = {
      prevent_destroy = true
    }
}
dynamic "lifecycle" {
    condition = can(var.ENVIRONMENT) # include_if = can(var.ENVIRONMENT)
    content = something  # as long as the map is what the resource expects in this block. i.e. its passing prevent_destroy
}

@schollii
Copy link

schollii commented Aug 7, 2021

@damon-atkins both condition and include_if are fine by me, but I'm not sure I agree with content. Can you provide an example?

I think count and enabled / condition / include_if would have to be mutually exclusive. Count means you want an array, possibly of 0 size, and the language should make it simple to handle the case where count is 0, a bit like one() in does.

Whereas enabled / condition is never an array, the resource is either there or null, and the DSL should make it easy to deal with null / disabled resource. Eg in an output that refers to resource that has enabled false, the output should be omitted without error instead of having to jump through hoops as we do now.

@tculp
Copy link

tculp commented Dec 10, 2021

I just ran into this when trying to create OpenStack resources, optionally using a keypair or creating one depending on a value. Unfortunately it is extremely ugly to make everything consider the keypair to be a list.

@damon-atkins
Copy link

damon-atkins commented Dec 13, 2021

content as show allow you to pass a data structure representing the HCL

locals {
   something = {
      prevent_destroy = true
      etc....
    }
}
dynamic "lifecycle" {
    include_if= can(var.ENVIRONMENT) # condition = can(var.ENVIRONMENT)
    content = something  # as long as the map is what the resource expects in this block. i.e. its passing prevent_destroy
}

vs

dynamic "lifecycle" {
    include_if = can(var.ENVIRONMENT) # condition = can(var.ENVIRONMENT)
    prevent_destroy = true
    etc.....
}

@damon-atkins
Copy link

@jbardin what is the next step in the process, for this idea to be approved in one shape or form.

@apparentlymart
Copy link
Member

apparentlymart commented Aug 16, 2022

Hi all!

I think there are at least four remaining design questions to be solved before something like this could be implemented:

  • The usual reason given to motivate this addition is the desire to be able to disable something without having to change all of the references to it. Currently if I add count to aws_instance.foo then every other reference to it must become aws_instance.foo[count.index] where count.index is 0, or must be a carefully-guarded hard-coded reference to aws_instance.foo[0] to avoid trying to access element zero of an empty list when there are no instances.

    However, just adding enabled = true or enabled = false alone doesn't clearly solve that problem. We need to decide what type of value aws_instance.foo would be both when it's enabled and when it's disabled, and that decision will in turn decide what references to that value would look like.

    One idea someone raised in a similar discussion elsewhere was to say that aws_instance.foo would be null if enabled = false. If we do that without changing any other parts of the language then it suggests that aws_instance.foo[count.index].id would need to become something like aws_instance.foo != null ? aws_instance.foo.id : null, which is significantly more verbose than what it replaced.

    Any design which aims to solve this problem must describe the declaration syntax and what affect it has on references to the resource that is disabled.

  • I expect we will need to also figure out what the equivalent idiom to "chaining with for_each" would be, when we have a set of resources that need to always be enabled and disabled together. The most direct translation of count = length(aws_instance.foo) or for_each = aws_instance.foo (assuming that aws_instance.foo itself uses for_each) would be enabled = aws_instance.foo != null, which again isn't obviously better than what it replaced.

    These chaining patterns emerged because it's pretty rare for single resources to be enabled/disabled in isolation... often we need to reliably correlate instances of one resource with instances of another to ensure correct behavior, and the chaining patterns have been successful in achieving that. I think a successful design for boolean enabling ought to describe its equivalent of these chaining patterns.

  • Since we're now subject to the Terraform v1.0 Compatibility Promises, any design proposal must explain how the new feature interacts with those promises.

    The most recently-discussed design above calls for adding a new meta-argument called enabled to resource, data, and module blocks. That is a compatibility hazard for any existing resource type which has an argument named enabled or any module which has a variable "enabled" declaration. It isn't yet clear to me how we would make the transition to this new feature without potentially breaking existing modules and existing uses of providers.

  • Implementation-wise, we need to figure out how Terraform Core will track a possibly-disabled resource. Today we have the idea of an instance key which serves to differentiate an instance of a resource from the resource itself in any situation where there isn't exactly one instance of the resource.

    enabled leads to the situation where there can be zero or one instances of a resource but without an obvious tracking key to differentiate the instance from the resource. It's not clear to me yet that this is super important, but a design which aims to solve this problem will need to define how Terraform ought to track these conditional instances in both plan and state, what Terraform ought to do if an author switches from count to enabled or vice-versa, and just generally how on earth this should all work.

I'd ask you all to recognize that the count argument has been with us long enough that it's easy to take for granted all of the existing design patterns and supporting language features that make it usable in practice. It's not sufficient to just name the new argument, and instead it's important to understand the equivalents of the various other patterns module authors use in relation to count. We had a similar challenge when introducing for_each, and even despite a significant amount of research and revision of the design we still missed the rough edge that people seem to expect the [*] operator to work with resources that use for_each in the same way as it does for count, just to name one example of a situation where we failed to consider an existing commonly-used pattern.

We are still open to discussing this direction but in order to move the discussion forward we need to work together to answer the remaining design challenges. There's still plenty of design work left to do here before it would be time to move on to implementation, so I'd love to hear ideas for how to address these, and indeed any other questions we might identify along the way in the process of answering the ones I've identified above.

@nikolay
Copy link

nikolay commented Aug 16, 2022

@apparentlymart I think enabled = <condition> could work just like count = <condition> ? 1 : 0 and people would have to use one(aws_instance.foo[*]). Basically, enbaled would be syntactic sugar and nothing more and thus fully backward-compatible. I've also seen people writing for_each = <condition> ? [1] : [], which could also be replaced with enabled = <condition>. Lastly, HCL could be extended with the comprehension-like resource ... {} if <condition> instead of making this an attribute, but this could be a bit more complicated for the whole toolchain.

@Tbohunek
Copy link

Tbohunek commented Aug 16, 2022

The best sugar of it all would be to not have to add unpredictable .[0] into every reference of the counted resource, and conditions to outputs (i.e. output would default to null or [] respectively if the resources get deployed or not).

@nikolay
Copy link

nikolay commented Aug 16, 2022

@Tbohunek I wish one(<reference>[*]) could be sugared somehow. Talking about outputs - I wish they had for_each, count, and enabled, too!

@damon-atkins
Copy link

damon-atkins commented Aug 16, 2022

count has order issues, for_each does not, hence people have change code to use for_each.

include_if or enable could be the same as commenting out the section of code. e.g. this does not exist do not include it, or I do not want a setting to be set in this account/resource. Thats why I suggest name include_if

Sometime people create two tf files which are the same with parts of one deleted, because a flag like this does not exist.

count=0 and for_each=[] still create an empty item.

@Tbohunek
Copy link

To actually react on @apparentlymart's post

  • enabled = aws_instance.foo != null

How about enabled = aws_instance.foo.enabled?

For output so far I defaulted all of them into null because it doesn't really matter.

@damon-atkins
Copy link

damon-atkins commented Aug 16, 2022

The following could be the same, i.e. if include_if is false treat the following as if it is commented out.

dynamic "lifecycle" {
    include_if= can(var.DESTROY) # condition = can(var.DESTROY)    Evaluated to TRUE, included lines
    prevent_destroy = var.DESTROY
}
# dynamic "lifecycle" {
#    include_if= can(var.DESTROY) # condition = can(var.DESTROY)   Evaluated to FALSE,  exclude lines
#    prevent_destroy = var.DESTROY
# }

@apparentlymart
Copy link
Member

We cannot add any new synthetic attributes to an expression like aws_instance.foo, because that's already defined to be an object of a type defined by the provider's schema. If we wanted to expose metadata about a resource block in addition to the actual data for that object then I expect we'd need to do it under a different namespace such as meta.aws_instance.foo, although I've not deeply considered the implications of doing that.

@Tbohunek
Copy link

Wouldn't the chosen attribute like .enabled become part of the schema for all objects?
The caveat here is rather to make .enabled accessible on an object that doesn't actually exist.

@apparentlymart
Copy link
Member

This comes back to my comment about defining what exactly aws_instance.foo evaluates to when we have a resource with enabled = true or enabled = false set. It's true that Terraform could add a synthetic additional attribute to the resource type's schema so that .enabled could be true in the case where it's enabled, but as you noted <null object>.enabled would fail with an error, so I don't think this design is quite right yet.

(Note that this is also in the vicinity of the other point I made about distinguishing the resource from the resource instance. For count and for_each we can unambigously say that aws_instance.foo refers to the resource as a whole while aws_instance.foo[0] or aws_instance.foo["example"] refer to instances of that resource; the problems with aws_instance.foo.enabled are a good example of the confusion that arises when there isn't some way to distinguish the resource as a whole from the instances of it.)

I also want to clarify that my intent with the previous comment wasn't to suggest that we can puzzle out all of those details individually over the course of many comments. Instead, I was imagining somebody writing a design proposal document that covers the whole problem space and proposed solutions all together, so that we can see how the different parts interact and make sure it all adds up to something that actually works as part of the broader language.

@schollii
Copy link

schollii commented Aug 17, 2022

In some languages (eg Python), null is an object ie it has a type and attributes just like any other (it is called None in Python but same thing). Could something like that work? So there would be a built-in resource that represents null and has enabled = false. Every other object would have enabled = true.

Also, let's not forget that the real goal is to not have to litter the code with [0] and code that checks for empty list. Groovy's approach to handling null should be considered : something like .? means null is returned if the object before it is null. In other words if object is null and the attribute referenced on it does not exist then null is returned.

@schollii
Copy link

schollii commented Feb 16, 2023

@Tbohunek Ah sorry I missed that, so here it is, looks like the answer to the following question is still no:

If resource.a was valid reference inside try() than it must be valid outside as well. Am I right?

No because this:

With

resource random_pet "counted" {
  count = 0
}

a try() makes reference to non-existent object pass, ie the following fails:

output "counted_res_3" {
  value = random_pet.counted[0]
}

whereas wrapping it in try() succeeds plan:

output "counted_res_4" {
  value = try(random_pet.counted[0], "something")
}

Here random_pet.counted[0] is a non-existent resource, it is an valid reference inside a try() but not outside. Terraform handles it differently (and appropriately, IMO).

So the "magic" is already there, and for optional resource we would have

resource random_pet "optional" {
  enabled = false
}

and the following would fail:

output "optional" {
  value = random_pet.optional
}

but wrapping it in try() would succeed:

output "optional" {
  value = try(random_pet.optional, {})
}

because terraform knows that optional is an optional resource, and if it is disabled it uses the same magic as it already uses with random_pet.counted[0].

Some things regarding the ?:

  • The ?. operator is "reference attribute on optional resource" and is separate from a ? operator which is "reference to optional resource"
  • Supporting ? (or for attributes, ?. or .?) is not an absolute requirement towards adding support for optional resources in the language, since we see from the above few comments that try() covers more use-cases than the operators could cover (since try can return any value, whereas the operator could return only null)
  • So I'm in favor of postponing the part of the discussion related to ?, to gain speed. Meanwhile I will look for examples of counted resources in public git repos and see if these operators could really make code more concise if they were available.

@schollii
Copy link

schollii commented Feb 16, 2023

Summary so far

(terminology clarification: Resources are specified via arguments and each resource instance has a type with specific attributes; the type is always available to terraform even if there are no instances of a resource)

Singleton resource:

(note: For lack of a better term I have named resources that do not have count or for_each as "singleton" resources, but if there is something official or better let me know)

  • no meta argument (like count etc)
  • n=1 only

Behavior in expressions:

expression abort plan when result when success
resource_type.name.attribute name does not exist for that resource type OR attribute does not exist value of attribute on the resource instance
try(resource_type.name.attribute, null) same as above (!! despite the try !!) value of attribute on resource instance
resource_type.name name does not exist for that resource type map of attributes of the resource
try(resource_type.name, null) name does not exist for that resource type (despite 'try`!) map of attributes of the resource (null never used)

Counted resource:

  • meta argument count = expression resulting in number
  • n=0 to some large number
  • the resource instance is a list of resources, where elements of the list have the type of the resource specifying the count

Behavior in expressions:

expression abort plan when result when success
resource_type.name[i].attribute name does not exist for that resource type OR attribute does not exist on list element type OR 'i>=n' value of attribute on instance i of resource list
try(resource_type.name[i].attribute, null) name does not exist for that resource type OR attribute does not exist on list element type value of attribute on instance i of resource list if i<n, but null if i>=n (try works despite no element at i!)
resource_type.name name does not exist for that resource type, even if n>0 a list of n objects, where each object is a map of the attributes of the resource at index i of the list (list may be empty)
resource_type.name[*].attribute name does not exist for that resource type, even if n=0 list of values for attribute on each element of the list
try(resource_type.name[*].attribute, null) name does not exist for that resource type OR attribute does not exist on list element type list of values for attribute on each resource instance of the list (null never used)
try(resource_type.name, null) name does not exist for that resource type a list of n objects, where each object is a map of the attributes of the resource at index i of the list (list may be empty; null never used)

The backward compatible change would be to make every singleton resource optional, via a new meta argument similar to count. Here, I call it enabled but this is just a placeholder so we can discuss; it could end up being any of the other variations mentioned in this thread:

Optional singleton resource:

  • meta argument enabled = expression resulting in a bool
  • enabled defaults to true if not specified
  • n=0 or 1
  • when disabled, the type exists, but the instance does not

Behavior in expressions:

expression abort plan when result when success
resource_type.name.attribute name does not exist for that resource type OR attribute does not exist on list element type OR 'enabled == false' value of attribute on instance of resource
try(resource_type.name.attribute, null) name does not exist for that resource type OR attribute does not exist on list element type value of attribute on resource if enabled == true; null if enabled == false (despite resource non-existing)
resource_type.name name does not exist for that resource type, OR enabled == false a map of the attributes of the resource
try(resource_type.name, null) name does not exist for that resource type null if resource has enabled false

There doesn't seem to be a need to talk about resource nullness at this level. It remains so far an implementation detail.

@omeid
Copy link

omeid commented Feb 16, 2023

@schollii I am afraid the elephant in the room is being overlooked.

In both examples, namely try(resource_type.name.attribute, null) and try(resource_type.name, null) the properties are invalid in the case of null resources and will result in error. Just because you pass them to try doesn't mean the property doesn't have to be valid.

If the languages is to be uplifted to allow this, then the notion of using try becomes pointless. Assuming try isn't going to be a magical function with it's own mini language.

@schollii
Copy link

schollii commented Feb 17, 2023

@omeid

the properties are invalid in the case of null resources and will result in error

I think insisting on resource being null when it is disabled is the elephant. Let's say that that is not what we do. Eg in Python I can define a class with a data member flag enabled and I can override the __getattr__ method to make it raise an exception when that enabled flag is false. Then any attribute access that is not allowed by my override will cause exception. This is very different from using None as the Python object to signify "disabled". The Python object is not None, it is something that has properties and behavior, but I use the language's support for intercepting attribute access to make that behavior dependent on the value of that flag (it could be anything such as logging access, retry on failure, whatever).

Maybe that could be done in HCL2, but I'm sure there are other approaches if that one does not work for HCL2. Whatever the approach, it must not use null for the resource.

@omeid
Copy link

omeid commented Feb 17, 2023

@schollii That is a fair way to put it.

But introducing Exception Handling and a more complex notion of a resource availability beyond exists or null is too much of a tall order IMO.

I am not convinced that the notion of "nullable" resource should be solved otherwise, but what it ought to be: nullable resources with syntax-sugar to make them ergonomic. This is a solvable problem in HCL in a backwards compatible way.

Anything else that would solve this by proxy demands unwanted complexity.

@Tbohunek
Copy link

Tbohunek commented Feb 17, 2023

@schollii Nice summary! 🚀

For a count=0 resource, why does this not abort plan (result is [])

try(resource_type.name, null)

but this does?

try(resource_type.name.id, null)

│ Error: Missing resource instance key
│ on main.tf line 44, in output "test":
│ 44: value = try(azurerm_container_group.test.name, null)
│ Because azurerm_container_group.test has "count" set, its attributes must be accessed on specific instances.
│ For example, to correlate with indices of a referring resource, use:
│ azurerm_container_group.test[count.index]

I get how it work technically, but to me it's inconsistent and I would make it abort in both cases.
And as you see from this long thread, it's quite counter-intuitive, which is always bad.
I didn't even know about it, which is why I always did this (in rare cases when I passed the entire resource):

try(resource_type.name[0], null)

This however explains why these two approaches that you propose:

try(resource_type.name, null) # access disabled resource
try(resource_type.name?.id, null) # access disabled resource's attribute

And now my question would be, if it would hurt to have only this instead:

try(resource_type.name?, null) # access disabled resource
try(resource_type.name?.id, null) # access disabled resource's attribute

IMO it would be a touch more predictable not to have to differ between accessing a disabled resource vs accessing its attributes.
I would avoid exception handling and inconsistent behavior whenever possible.

Nevertheless, the latter (accessing attributes of count = 1 resource) is my most common current use-case.
Would you mind adding it to your summary? Did you intentionally exclude this scenario? 😜

@Tbohunek
Copy link

However @schollii I have to come back to notion of nullable over

schollii: override the __getattr__ method to make it raise an exception.

If to make resource enabled we would always have to turn all of this

value = resource_type.name
value = resource_type.name.id

into bunch of this

value = try(resource_type.name, null)
value = try(resource_type.name?.id, null)

...It probably makes static analysts happier, however it makes all terraform users unhappier.
It's just pointless work that indeed I want the computer to do for me.
Else we could just go back to writing full-fledged json structure for each resource.

I'm not a huge fan of yaml, but I do like how they not waste a single letter:
10 lines of code can launch even your Space Shuttle navigation system. 🚀

@zachwhaley
Copy link
Contributor

zachwhaley commented Feb 17, 2023

At the risk of adding more noise, I'd like to reiterate the problem trying to be solved here and showcase how @schollii solution does a good job of addressing that.

Take this example:

resource "null_resource" "this" {
  triggers = {
    pet_one = random_pet.one.id
    pet_two = random_pet.two.id
  }
}

resource "random_pet" "one" {
}

resource "random_pet" "two" {
}

output "triggers" {
  value = null_resource.this.triggers
}

If we were to need to "disable" one of or all of these resources, those resources would have to add a count metavar to their resource and every reference of the resource has to add a [0] index to itself, or some other form of try/default mechanism: try, one, [*], etc. That creates a lot of code churn and can make for a very large and error prone diff.

Example diff:

 resource "null_resource" "this" {
+  count = var.enabled ? 1 : 0
+
   triggers = {
-    pet_one = random_pet.one.id
-    pet_two = random_pet.two.id
+    pet_one = random_pet.one[0].id
+    pet_two = random_pet.two[0].id
   }
 }

 resource "random_pet" "one" {
+  count = var.enabled ? 1 : 0
 }

 resource "random_pet" "two" {
+  count = var.enabled ? 1 : 0
 }

 output "triggers" {
-  value = null_resource.this.triggers
+  value = try(null_resource.this[0].triggers, "none")
 }

The other way to avoid this is to always create resources with an explicit count = 1, and always reference resources with a [0] index. But that is clearly not best practice.

If however, there were a way to "disable" a resource and that disabled resource would ignore any also disabled resources referenced in its block, much like the way the above doesn't care that random_pet.two[0].id doesn't exist when var.enabled is false, then less code churn would be created with smaller less error prone diffs.

Example:

 resource "null_resource" "this" {
+  enabled = var.enabled
+
   triggers = {
     pet_one = random_pet.one.id
     pet_two = random_pet.two.id
@@ -6,11 +8,13 @@ resource "null_resource" "this" {
 }

 resource "random_pet" "one" {
+  enabled = var.enabled
 }

 resource "random_pet" "two" {
+  enabled = var.enabled
 }

 output "triggers" {
-  value = null_resource.this.triggers
+  value = try(null_resource.this.triggers, "none")
 }

Notice that the use of try hasn't changed much at all, but the use of references to disabled resources inside disabled resources does not need any code change. Obviously a disabled resource reference in a non disabled resource would need to use try to fetch the value of the disabled resource reference or default to some other value, just like currently a non-counted resource has to use try, one or [*] to do the same with a possibly counted resource reference in its block.

To reiterate, the problem trying to be solved here is the code churn and error prone diffs that result from Terraform's currently lack of a easier and less verbose way of disabling resources conditionally.

@Tbohunek
Copy link

@zachwhaley you refer to @schollii solution, however you didn't follow it as you couldn't do:

pet_one = random_pet.one.id

You'd have to do:

pet_one = random_pet.one?.id

This is important detail, and makes the churn grow again:

resource "null_resource" "this" {
+  enabled = var.enabled
+
   triggers = {
-    pet_one = random_pet.one.id
-    pet_two = random_pet.two.id
+    pet_one = random_pet.one?.id
+    pet_two = random_pet.two?.id
#+   pet_one = random_pet.one[0].id
#+   pet_one = random_pet.two[0].id

(How did you do that awesome highlight?)

@schollii
Copy link

schollii commented Feb 17, 2023

@Tbohunek

You'd have to do:.. pet_one = random_pet.one?.id

I don't think you would have to, because as @zachwhaley indicated, once a resource is disabled (like the null_resource in his example), any expressions within can be ignored by terraform, so the fact that it references an attribute on a disabled resource is irrelevant when it is in a disabled state. Then when it is re-enabled, those expressions will be processed again, but that time, these dependent resources will also be enabled so plan will succeed.

@schollii
Copy link

@Tbohunek

For a count=0 resource, why does this not abort plan (result is [])
try(resource_type.name, null)
but this does?
try(resource_type.name.id, null)
...to me it's inconsistent...

It is totally consistent:

  • Your tf specifies resource_type "name" as a counted resource (it has a count meta-argument), therefore the expression resource_type.name is a list of objects (one per resource instance in the counted resource; each object is a map of the attributes of the object). As I show in one of my tables, that expression always evaluates properly, even if count = 0: it is a list so if count = 0 the list is empty, that's all. Therefore wrapping it in a try is useless as it will never return the second arg of the try (null in your example).
  • Since that expression is a list, and a list does not have an attribute called id, and terraform try() does not catch invalid attribute access, then try(resource_type.name.id, null) must abort plan, despite the try.

Same for try(resource_type.name[*].typo, null): this will always abort plan because although here terraform would attempt to get typo attribute on all elements of the counted resource list, but terraform knows that typo is not an attribute on the element type, so it aborts. So again in that case the try is useless.

OTOH try(resource_type.name[*].id, null) works, because id is an attribute of elements of the list of resources, but again it is redundant because if the list is empty, resource_type.name[*].id is just an empty list.

I think that's another good reason to introduce enabled / disabled: thinking in terms of lists can be rather tricky, and leads to nasty code like element(concat(aws_vpc.this.*.id, [""]), 0) and concat(aws_security_group.this.*.id, aws_security_group.this_name_prefix.*.id, [""])[0] (see #21953 (comment)).

@schollii
Copy link

schollii commented Feb 17, 2023

If to make resource enabled we would always have to turn all of this

value = resource_type.name
value = resource_type.name.id

into bunch of this

value = try(resource_type.name, null)
value = try(resource_type.name?.id, null)

That would definitely not be the case:

  • the whole point of the ? operator was to not have to wrap with a try call: so you could just write value = resource_type.name?.id and this would return value of id if resource_type.name is enabled, and null otherwise.
  • the observation by @zachwhaley means neither try nor that operator would be necessary in a whole bunch other use cases

So given the following use case:

  • Context: name is a resource of resource_type and type defines an attribute id, and has enabled=true (explicitly or implicitly by omission of enabled)
  • Action: user switches enabled to false on that resource
  • Consequence/Outcome:
    • all expressions in locals, outputs and "active" resources (ie enabled, in non-0 count list, and in non-empty for-each map) that use resource_type.name.id without a try/can will cause plan to abort; this is desirable because each expression that fails this way implies it is an invalid expression when the resource becomes disabled, and it is up to you to decide how to fix that expression (should the value be null, an empty string, a non-empty string that can be grepped in output, do you need try() or can() or can you get away with that ? operator, something else altogether).
    • no disabled resources that use that expression need changing

That sounds pretty sweet to me.

This is not the only interesting use case, another one is selecting between different implementations that a module can provide, such as allowing postgres vs mysql, or aurora vs postgres, or allowing the user to select whether they want their security group to have fixed name or just a prefixed-name (and AWS generates the portion after prefix). In that latter case, with enabled it looks something like this:

resource_type prefixed {
  enabled = var.prefixed && var.user_specifiied_id == null
}
resource_type not_prefixed {
  enabled = ! var.prefixed && var.user_specifiied_id == null
}
locals {
  id = try(resource_type.prefixed.id, resource_type.not_prefixed.id, var.user_specifiied_id)
}

because the following will happen:

  • if you have chosen prefixed, then prefixed resource is enabled and not_prefixed is not, so id will be that of prefixed
  • if you have chosen not prefixed, then prefixed is disabled and not_prefixed is enabled, so id will be that of not_prefixed
  • if you have chosen to create the resource elsewhere so you are just providing this id via var.user_specified_id, then both prefixed and not_prefixed are disabled so id will be the value of var.user_specified_id

@schollii
Copy link

@Tbohunek

Nevertheless, the latter (accessing attributes of count = 1 resource) is my most common current use-case.
Would you mind adding it to your summary? Did you intentionally exclude this scenario?

No intention. It is difficult to keep track of the many ideas, questions, and technical details due to typos or just forgetting or just misunderstanding someone else's statement. My apologies if something I wrote comes across as intentionally dropping something you suggested.

So I'm happy to add it, but I'm not sure what I should be adding! Can you clarify, like write out the table row or cell as you think it should be.

@Tbohunek
Copy link

Tbohunek commented Feb 17, 2023

Those damn lists! Thanks for your explanations @schollii, I hope these will be useful to more than just me. 👋

I originally didn't notice that you haven't included the ?. in your summary, but now thanks to @zachwhaley's example it's obvious that we don't need it at all anymore.
However, this is something you previously insisted on. Now we'd only have to modify references in non-disabled resources, which is still annoying for me, but much more feasible.

It also makes clear the IMO most important part: tfstate path to null_resource.this would remain unchanged. 👍

So now I think would be the time for Hashicorp to again look at your proposal and see if that's achievable or not.

@Tbohunek
Copy link

shollii: So I'm happy to add it, but I'm not sure what I should be adding!

It would have been just to explicitly point out try(resource_type.name.id, null) and also without try() would always abort on counted, for completeness, but that's all.

@mastertheknife
Copy link

For a single resource , i would also love to see enabled being implemented. The count just feels like a hack and creates more trouble than it's worth. Terraform is amazing and new cool features have been added (e.g. optional type constraints) but this simple yet highly desired functionality is missing.

@cpboyd
Copy link

cpboyd commented May 3, 2023

count doesn't even seem to work properly for conditionally executing certain data: hashicorp/terraform-provider-vault#1838

@AlecTaggart
Copy link

count is a hack. The behavior should not be like this. A great example of the struggles here is when creating an aws load balancer + target group/target group attachments in a module. If you want to target an interface endpoint for example, you need to use the 'data "aws_network_interface" "data-name" {}' block to pull in the endpoints interface parameters (the private ips etc). You can then pass this to the load balancer module and use them as attachments. The problem here is is you want to conditionally create the load balancer it wont work... there is no way to avoid the "value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created." error message (none I have found). depends_on should fix this problem but it does not work. Makes me want to move away from terraform all together

@Tbohunek

This comment was marked as off-topic.

@vanmash

This comment was marked as off-topic.

@Tbohunek
Copy link

Tbohunek commented Feb 4, 2024

Well, the question about official triage feedback was not off-topic.
It's been a year since @schollii proposed a viable design of how this could work.
All we want is for the code to do by itself what we do manually today.
But Hashicorp keeps quiet, forcing thousands of its users to write coding abominations.

@nitoxys
Copy link

nitoxys commented Feb 8, 2024

Enabled seems like a similar solution to Ansible's when statement.

Similar solution:
when = var.firewall == "enabled"
when = var.firewall == "enabled" || var.nsg == "nsg-firewall"

@ikegentz
Copy link

Whatever the solution is, can it just be added to the next major TF version (2.x.y+)? I'd happily go refactor my code once to get it onto a major version if it gets this feature implemented sooner, instead of the debate raging on for 3 more years and nothing ever being done, meanwhile being stuck with count and moving resources contstantly to use the index reference

@inaun
Copy link

inaun commented Apr 26, 2024

I have never been able to understand why Terraform doesn't include a simple conditional, as suggested. Be it "if var.value = true", "enabled = var.value", or someting -- anything -- other than having to rely on hacking the looping parameters to create conditional resources.

This is a normal feature we find in most other languages. Specifically for Terraform, it has been requested in many places multiple times. People have blogged about how terrible Terraform is because a simple construct like this is not included. Yet for some reason a deaf ear is turned toward including this feature.

Please don't make excuses anymore about why the "if" construct is unnecessary. Instead, please listen to the voices of the people working in Terraform day in and day out. This seems like a simple feature to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests