Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate duplicate provider local names in required_providers #31218

Merged
merged 3 commits into from Jun 15, 2022

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Jun 9, 2022

Adding multiple local names for the same provider type in required_providers was not prevented, which can lead to ambiguous behavior in Terraform. Providers are always located via the provider's fully qualified name (even using the full name as a map key in many places), so duplicate local names cannot be differentiated. This can lead terraform to non-deterministically configure the incorrect provider in some cases.

Because it is currently possible that a configuration with providers which themselves do not need explicit configuration may have been working with duplicates, we cannot turn these into errors. Adding multiple entries for the same provider now results in a warning like so:

│ Warning: Duplicate required provider
...
│
│ Provider hashicorp/null with the local name "dupe" was previously required as "null". A provider
│ can only be required once within required_providers.

We can also check for the situation where a user required a provider by a name different from the default, but attempted to configure that provider via the default local name. This can help prevent the case where a provider configuration may apparently not always be taking effect within a module. The warning here is worded slightly differently:

│ Warning: Duplicate required provider
...
│
│ Provider hashicorp/null with the local name "dupe" was implicitly required via a configuration
│ block as "null". Make sure the provider configuration block name matches the name used in
│ required_providers.

fixes #31196

@jbardin jbardin added the 1.2-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged label Jun 9, 2022
@jbardin jbardin requested a review from a team June 9, 2022 20:32
@jbardin jbardin force-pushed the jbardin/validate-provider-local-names branch from f6ddbc8 to ce5758f Compare June 9, 2022 20:36
Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice improvement to me that should give some better feedback about a situation that was already invalid but checked and reported in a confusing way.

I do have a few questions about the overall impact of this change:

  • Were you able to verify that there weren't any situations before where someone might have been able to "get away with" declaring multiple provider local names in this way without hitting some sort of error?

    I cannot imagine any case that should've worked, because declaring two providers with different local names and then configuring against them both means exactly the same thing as declaring two provider configurations with the same local name and alias. But given that this was unambiguous, I do wonder if there's a possibility that something might have crept through that could be arguably covered by the v1.0 compatibility promises. 😬

  • How does this interact with the various implied provider requirements Terraform infers for backward compatibility? For example:

    terraform {
      required_providers {
        notaws = {
          source = "hashicorp/aws"
        }
      }
    }
    
    provider "notaws" {
      # ...
    }
    
    
    provider "aws" {
      # ...
    }

    I think your TestProviderConfigTransformer_duplicateLocalName test is asserting that this is allowed, but that seems counter-intuitive to me since it seems like it's still effectively declaring the hashicorp/aws provider under two local names, with one of them just being implied by our back-compat rules rather than written out explicitly by the user.

    Might we need an additional rule in ProviderConfigTransformer to treat the above as invalid, or does that already get taken care of by some logic I can't see clearly in this diff?

@jbardin
Copy link
Member Author

jbardin commented Jun 10, 2022

Hmm, actually that test example came from my first iteration where this was warning, but I found that because there were multiple points where providers were located only by their full name, you couldn't configure them without possibly running into errors. But now I see the problem here, it's technically possible to have a completely working configuration if you're not configuring the provider at all (or maybe not even using the provider!). There would be no reason to write such a config in the first place, but maybe a module could end up there through some refactoring? 🤔

I think I'm going to loosen this back up to a warning to be on the safe side.

Adding multiple local names for the same provider type in
required_providers was not prevented, which can lead to ambiguous
behavior in Terraform. Providers are always indexed by the providers
fully qualified name, so duplicate local names cannot be differentiated.
We can skip providers which already have a node in the graph for their
type.
@jbardin jbardin force-pushed the jbardin/validate-provider-local-names branch from ce5758f to d7238d5 Compare June 10, 2022 14:40
@jbardin jbardin self-assigned this Jun 10, 2022
@jbardin jbardin requested review from a team and removed request for apparentlymart June 14, 2022 18:09
// was required via a different name.
impliedLocalName := req.Type.Type
// We have to search through the configs for a match, since the keys contains any aliases.
for _, pc := range mod.ProviderConfigs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the related situation of a provider requirement being implied by a resource or data block which doesn't set the provider argument and whose resource type name starts with a prefix that doesn't match any explicitly-declared identifier. Should we catch that one too, or does it end up not mattering because the resource ends up belonging to the correct provider configuration anyway?

(Specifically: I don't recall if that backward-compatibility behavior causes us to implicitly declare a new provider configuration with the local name matching the prefix, or if it instead just hooks it up to whatever provider configuration it finds that has the assumed provider source address. If it's the latter then I guess it probably doesn't really matter, although it's still potentially confusing to have e.g. aws_instance automatically associated with provider "notaws" just because required_providers declared notaws as being hashicorp/aws. 🤔)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it does work out that the implied provider based on the resource prefix will just get hooked up to the correct provider, even if it has a different local name, perhaps it's best to warn about that case too? I didn't delve into that here because it doesn't cause any known issues, and it's not as straightforward as dealing with the provider configs alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it seems a little unclear what we should do here... I do like the consistency of treating a provider dependency implied by a resource the same way as a provider dependency implied by a provider configuration, since it seems like both of those things are a common source of confusion for folks anyway. But if it is particularly gnarly to do it then I would agree probably not worth it.

I have some memory of a function somewhere which takes something describing a provider and encapsulates the logic for deciding which provider configuration it ought to belong to, which if we could find it might allow all of these situations together as one rule (a provider = aws.foo can also imply a dependency on hashicorp/aws if there isn't already an aws in scope), but if we'd end up having to duplicate all of the logic for detecting implicit provider requirements in here then I'd say that's probably too high a cost for just a little consistency. (Unfortunately I don't remember where this function is, assuming I'm not misremembering it existing in the first place, but maybe you have seen it on your travels while working on this... 🤔)

Copy link
Member Author

@jbardin jbardin Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking this final case wasn't that bad, we've already collected the required_providers local names, so we just need to iterate over the resources looking for implied providers which aren't in that set.

The newest commit adds a diagnostic like:

│ Warning: Duplicate required provider
...
│ Provider "null" was implicitly required via resource "null_resource.a", but listed in
│ required_providers as "nope". Either the local name in required_providers must match the resource
│ name, or the "nope" provider must be assigned within the resource block.

This diagnostic output only uses the local names, because this situation can only happen with providers from the default namespace with a default provider name inferred from the resource name.

Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry... more noodling here as I dug into the new logic you wrote. Each time I read new code in this PR it reminds me of things I'd forgotten about the provider resolution logic, and so I'm sorry I'm kinda just dribbling context slowly over the course of many round-trips. 😖

Comment on lines 171 to 172
"Provider %q was implicitly required via resource %q, but listed in required_providers as %q. Either the local name in required_providers must match the resource name, or the %q provider must be assigned within the resource block.",
localName, r.Addr(), prevLocalName, prevLocalName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subjectively, I think it'd help a little to make that first %q be defAddr instead of just localName, even though we know that it'll always just be hashicorp/ followed by the localName anyway, just to add a little nudge here for those who aren't yet aware of this implied dependency mechanism and might benefit from us linking it with what they can see in the terraform init and terraform providers output.

Probably we'll refine this message some more once we see what sorts of questions people ask about it, since it's hard to guess right now what a typical recipient of this message might know or not know, so otherwise I think this is a fine place to start!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had that at first, but changed it as I cut the message down for brevity ;) THat's a fair point though, and will put it back.

Comment on lines +150 to +153
// We're looking for resources with no specific provider reference
if r.ProviderConfigRef != nil {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we also looking for resources that explicitly specify a provider local name that wasn't declared?

  provider = aws.foo

I believe (though I haven't tested) that if we encounter the above without there being an aws in required_providers then we treat it as an implied requirement for hashicorp/aws, just as we would if it were not set at all but the type name begins with "aws".

Resource.ProviderConfigAddr seems to be the definition of which local name a particular resource is associated with, and so maybe we can use that here and then take the LocalName field of the result as the local name to use below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is validated already which is why I didn't check it here, but not until core tries to connect the providers in the graph.

This additional check was only intended to get the case of an implied local name not matching the required_providers local name, in which which case ProviderConfigAddr hides the fact that provider wasn't set. I guess we could also add more validation to resource provider arguments during config loading, but that was outside of the goal of this PR.

continue
}

defAddr := addrs.NewDefaultProvider(localName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could instead be a call to Module.ImpliedProviderForUnqualifiedType, which I believe aims to encapsulate the logic for "what is the fallback to use if this doesn't match an explicit provider requirement?".

That does also consult the required providers for the module, and so calling it directly wouldn't allow distinguishing between a reference to an explicit declaration vs. a backward-compatibility fallback. So maybe then the better answer would be to skip over that layer and go directly to addrs.ImpliedProviderForUnqualifiedType, which at least deals with the special case that terraform means terraform.io/builtin/terraform rather than hashicorp/terraform. Or maybe we could change the signature of Module.ImpliedProviderForUnqualifiedType to return an additional result that signals explicitly whether this was taken from an explicit required_providers declaration or just a compatibility fallback, in the interests of keeping all of the logic using mostly the same codepaths. 🤔

Generally I'd encourage leaning on these existing methods as much as they make sense, because they are there to encapsulate all of the little oddities that arose in the process of developing the provider source scheme, and so if we skip over them and try to reimplement their logic out here it's likely that we'll miss some edge case that we've forgotten about in the meantime and make this not behave consistently with the logic elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could add another return parameter to ImpliedProviderForUnqualifiedType to show how it was resolved, but I purposely did avoid that here. The validation is only looking for undeclared implied provider names which are named something different in required_providers. This means we have to skip the automatic resolution mechanisms like ImpliedProviderForUnqualifiedType, since those are what would connect us to the provider declaration that we are trying to avoid.

I was kind of view this from the perspective that the methods we present like Module.ImpliedProviderForUnqualifiedType rely on the underly structured data being valid, but here we are doing the actual validation of that data.

Though here I did mean to use addrs.ImpliedProviderForUnqualifiedType, but ended grabbing the older NewDefaultProvider function. That shouldn't make any difference in this case since you can't add the internal terraform provider into required_providers, but yes, we should continue to use them more consistently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember us making any special effort to treat source = "terraform.io/builtin/terraform" as invalid, so as far as I know you can include it there if you really want to give it a local name other than terraform, though of course that would be a pretty bizarre thing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, you can, not sure why I thought we disallowed that 🤔

@jbardin jbardin force-pushed the jbardin/validate-provider-local-names branch from 35f082a to 69f0c7d Compare June 15, 2022 17:30
Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks for indulging me on my gradual trip through reloading the context about how all of this works in order to actually review it. 🙄

@jbardin jbardin merged commit 0d3d954 into main Jun 15, 2022
@jbardin jbardin deleted the jbardin/validate-provider-local-names branch June 15, 2022 17:51
@github-actions
Copy link

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1.2-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: Invalid provider configuration
2 participants