Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Error: Failed to read module directory' after upgrading to terraform 1.2.7 #31615

Closed
pickgr opened this issue Aug 10, 2022 · 30 comments
Closed
Labels
bug confirmed a Terraform Core team member has reproduced this issue registry

Comments

@pickgr
Copy link
Contributor

pickgr commented Aug 10, 2022

Terraform Version

$ terraform version
Terraform v1.2.7
on darwin_amd64

Terraform Configuration Files

module "vpc_endpoints_nocreate" {
  source  = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
  version = "3.7.0"

  create = false
}

Debug Output

2022-08-10T11:55:40.986-1000 [TRACE] modsdir: writing modules manifest to .terraform/modules/modules.json
╷
│ Error: Failed to read module directory
│
│ Module directory .terraform/modules/platform.vpc_endpoints_nocreate/modules/vpc-endpoints does not exist or cannot be read.

Expected Behavior

No errors during init.

Actual Behavior

Init failed with error show above.

Steps to Reproduce

  1. terraform init -upgrade

Additional Context

References

I believe this may be related to #31573

Also see the public submodule in the registry at https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest/submodules/vpc-endpoints

Note how there are two backslashes for the path for the source specified in the example.

Unfortunately, I don't have time to dig into it further right now.

@pickgr
Copy link
Contributor Author

pickgr commented Aug 10, 2022

FYI @radeksimko

@sblask
Copy link

sblask commented Aug 10, 2022

Not the same error, but I believe the same problem:

Error: Unsupported argument

  on .terraform/modules/network/aws-network/vpc.tf line 8, in module "subnet_addrs":
   8:   base_cidr_block = var.base_cidr_block

An argument named "base_cidr_block" is not expected here.

Error: Unsupported argument

  on .terraform/modules/network/aws-network/vpc.tf line 9, in module "subnet_addrs":
   9:   networks = [

An argument named "networks" is not expected here.
Error: Unsupported argument

  on .terraform/modules/network/aws-network/vpc.tf line 8, in module "subnet_addrs":
   8:   base_cidr_block = var.base_cidr_block

An argument named "base_cidr_block" is not expected here.

Error: Unsupported argument

  on .terraform/modules/network/aws-network/vpc.tf line 9, in module "subnet_addrs":
   9:   networks = [

An argument named "networks" is not expected here.

When using https://registry.terraform.io/modules/hashicorp/subnets/cidr/latest

In terraform init it looks fine:

Downloading registry.terraform.io/hashicorp/subnets/cidr 1.0.0 for network.subnet_addrs...
- network.subnet_addrs in .terraform/modules/network.subnet_addrs

But .terraform/modules/network.subnet_addrs is empty...

@jensenbox
Copy link
Contributor

Solved with: 😅

    terraform {
        required_version = "> 1.0, < 1.2.7"
    }

Totally tongue in cheek but yea, I really had to do this - everything I ran was breaking.

╷
│ Error: Failed to read module directory
│ 
│ Module directory .terraform/modules/atlantis.alb_http_sg/modules/http-80 does not exist or cannot be read.
╵

That said, what was really interesting was that I downgrade, ran once and upgraded again and everything worked. This tells me it is exclusive to the downloader portion of the code.

@radeksimko
Copy link
Member

I haven't done any debugging but I don't see how this could be related to #31573 as that PR touched provider address validation, not module source. I will let someone from the Core team to chime in.

@mesaugat
Copy link

Problem with v1.2.7 and I reverted back to v1.2.6 which works fine. terraform init seems to work but terraform plan has a bunch of "Unsupported argument" errors.

I am using the terraform-aws-modules/s3-bucket/aws module.

│ Error: Unsupported argument
│
│   on ../main.tf line 9, in module "s3_bucket":
│    9:   bucket = var.name
│
│ An argument named "bucket" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 10, in module "s3_bucket":
│   10:   acl    = "private"
│
│ An argument named "acl" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 12, in module "s3_bucket":
│   12:   attach_policy = var.policy == {} ? false : true
│
│ An argument named "attach_policy" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 13, in module "s3_bucket":
│   13:   policy        = var.policy
│
│ An argument named "policy" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 15, in module "s3_bucket":
│   15:   tags = merge(
│
│ An argument named "tags" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 23, in module "s3_bucket":
│   23:   versioning = {
│
│ An argument named "versioning" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 27, in module "s3_bucket":
│   27:   server_side_encryption_configuration = {
│
│ An argument named "server_side_encryption_configuration" is not expected here.
╵
╷
│ Error: Unsupported argument
│
│   on ../main.tf line 34, in module "s3_bucket":
│   34:   lifecycle_rule = var.enable_std_lifecycle == true ? [{
│
│ An argument named "lifecycle_rule" is not expected here.

@pascalmtts
Copy link

Terraform is totally useless with 1.2.7. Everything breaks

@AlexEndris
Copy link

We experienced the same thing with the terraform-aws-modules/lambda/aws module as @mesaugat. This doesn't occur with version 1.2.6.

@SuchismitaGoswami
Copy link

We are also experiencing the same problem with this module - terraform-aws-modules/ec2-instance (terraform-version=1.2.7)

@thomaskvnze
Copy link

thomaskvnze commented Aug 11, 2022

We have similar issue. We use a module from the registry. Unfortunately, the module is downloaded, but the submodules used by that module are not, leading to half the module code not being there.

@jbardin
Copy link
Member

jbardin commented Aug 11, 2022

Thanks everyone! We are currently investigating the issue.

@jbardin jbardin added registry confirmed a Terraform Core team member has reproduced this issue and removed new new issue not yet triaged labels Aug 11, 2022
@tibz-enex
Copy link

Same issue for us running terraform version 1.2.7 and trying to download module terraform-aws-modules/iam/aws/modules/iam-role-for-service-accounts-eks

@pickgr pickgr changed the title Error: Failed to read module directory after upgrading to terraform 1.2.7 'Error: Failed to read module directory' after upgrading to terraform 1.2.7 Aug 11, 2022
@jbardin
Copy link
Member

jbardin commented Aug 11, 2022

The problem appears to have originated from the registry and numerous incorrectly cached responses. Please let us know if there are any modules which continue to exhibit this behavior with v1.2.7.

@pickgr
Copy link
Contributor Author

pickgr commented Aug 11, 2022

The problem appears to have originated from the registry and numerous incorrectly cached responses. Please let us know if there are any modules which continue to exhibit this behavior with v1.2.7.

I'm still seeing this with the original module I reported?

╷
│ Error: Failed to read module directory
│
│ Module directory .terraform/modules/platform.vpc_endpoints_nocreate/modules/vpc-endpoints does not exist or cannot be read.
╵

@jbardin
Copy link
Member

jbardin commented Aug 11, 2022

Thanks @pickgr, I'll let them know not all the URLs have been purged.

@jcolfej
Copy link

jcolfej commented Aug 11, 2022

Hello, we also have the problem with this module : https://registry.terraform.io/modules/terraform-aws-modules/alb/aws/6.10.0
Thx for your help @jbardin ;)

@apparentlymart
Copy link
Member

apparentlymart commented Aug 11, 2022

Hi all! Thanks for reporting this incorrect behavior.

If you need to use a module that has an incorrect cache entry that hasn't yet been purged, I believe it should work to stay on Terraform CLI v1.2.6 for the moment (since the cached registry responses for that version are still correct) until the modules you need to use have had their caches purged.

Please do let us know if you've found a problem with a module that wasn't already mentioned above, though, so we can take full stock of the scope of this when we run a retrospective later. If possible it would be ideal to see exactly what you have in both the source and version arguments in your module block, just so that we can get a better sense of what is and is not affected.


As a little additional context about what seems to be going on here, for those who are following along with the details right now now, or those who might find this issue in future and wonder what was going on:

Terraform Registry implements the module registry protocol, which is essentially just an indirection over module sources that layers on the idea of there being multiple versions of each logical module. The registry is therefore really just an index of module packages published elsewhere, and doesn't truly host anything itself.

For the public Terraform Registry in particular, the "elsewhere" is GitHub repositories, and so when Terraform CLI asks registry.terraform.io a question like "Where can I find version 3.7.0 of terraform-aws-modules/vpc/aws?", the registry responds by returning a module package address just like you might've written directly into the source argument if you weren't using the registry, referring to a path in the underlying GitHub repository.

For reasons we're not yet quite sure about, it seems that the registry's CDN cache for certain module versions got "poisoned" with a legacy incorrect URL that doesn't correctly refer to the right directory within the module package. So far it seems that the cached response was an old-style URL to a source tarball on GitHub, and GitHub's source tarballs put the repository content into a subdirectory named after the repository rather than directly in the root, so the subdirectory path for the module mentioned in the leading comment would really be //terraform-aws-vpc-v3.7.0/modules/vpc-endpoints rather than just //modules/vpc-endpoints, and so when Terraform looked at the incorrect path the registry returned it found the directory missing.

We're still investigating what exactly happened here. At first blush it seems that a backward-compatibility heuristic somehow miscategorized Terraform v1.2.7 as an older version of Terraform requiring a protocol shim, and then that response got cached for certain modules. For the moment we're doing quick mitigation via cache purging but also need to track down the root cause for why the incorrect response had been returned in the first place. We're still looking into the root cause but in the mean time can purge specific modules that have incorrect caches in order to make them work again.

@llamahunter
Copy link

llamahunter commented Aug 11, 2022

Please do let us know if you've found a problem with a module that wasn't already mentioned above, though, so we can take full stock of the scope of this when we run a retrospective later. If possible it would be ideal to see exactly what you have in both the source and version arguments in your module block, just so that we can get a better sense of what is and is not affected.

We see this problem with private modules that are hosted at github and referenced by their github url and tag. Works fine with 1.2.6, broken with 1.2.7 (with similar "An argument named "blah" is not expected here." errors)

@mconigliaro
Copy link

We're seeing this with https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/ since yesterday afternoon.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.24.1"
  ...
}

@jensenbox
Copy link
Contributor

If you are still looking for examples of what was failing...

module "log_bucket" {
  source = "registry.terraform.io/terraform-aws-modules/s3-bucket/aws"
}

No version applied.

@acdha
Copy link

acdha commented Aug 11, 2022

I know terraform-aws-modules/lambda/aws was mentioned above but either it wasn't purged or the purge is not enough to fix the problem. It's still failing on our completely clean CI runners with no local cache.

@mdodsworth
Copy link

I'm still seeing issues with https://registry.terraform.io/modules/terraform-aws-modules/rds-aurora.

source  = "terraform-aws-modules/rds-aurora/aws"
version = "7.2.2"

@jbardin
Copy link
Member

jbardin commented Aug 11, 2022

OK, we located the URLs which were known to be affected, and purging the cache for those is now complete.
If you did init with a bad response, removing the .terraform directory to get rid of the cached modules may be necessary to force Terraform to re-download the correct URL.

@llamahunter
Copy link

llamahunter commented Aug 11, 2022

OK, we located the URLs which were known to be affected, and purging the cache for those is now complete. If you did init with a bad response, removing the .terraform directory to get rid of the cached modules may be necessary to force Terraform to re-download the correct URL.

Still not working with our atlantis deploys. Is there some way to tell atlantis to clean its cache directory? Also, why not release 1.2.8 that fixes whatever cache corruption 1.2.7 introduced?

@crw
Copy link
Collaborator

crw commented Aug 11, 2022

OK, we located the URLs which were known to be affected, and purging the cache for those is now complete.

In this case, the caches being referred to here are the caches Terraform Registry uses to lookup the correct response to a query for a particular module+version. These are all "server-side" from the perspective of a Terraform CLI user. With the caches "purged," they are now returning the correct responses to the queries.

Also, why not release 1.2.8 that fixes whatever cache corruption 1.2.7 introduced?

The local .terraform folder cache is not corrupted per se, it may just contain incorrect URL data. I am not sure that it would be possible to detect which URLs were incorrect in any reliable way. This is a case in which you likely know if you are impacted by the issue or not, and can enact the remediation on your end.

Unfortunately, as we do not develop or maintain Atlantis, I do not believe we know how to clear the cache folder for Atlantis. (edit: "we" as in the Terraform Core team). Thanks for your questions!

@pickgr
Copy link
Contributor Author

pickgr commented Aug 12, 2022

I've confirmed everything is working for me with 1.2.7 now. Note that running terraform init -upgrade may be an alternative to manually removing the .terraform directory.

Thanks for the quick turnaround everyone!

@apparentlymart
Copy link
Member

apparentlymart commented Aug 12, 2022

I think my comment above may have created some confusion when considered in conjunction with the other kind of "cache" some are discussing here, so just to clarify:

In my case, I was discussing the remote cache living in the CDN that provides the Terraform Registry service. That cache is under the control of our Terraform Registry team and so they are able to proactively purge it; that is what @jbardin was meaning above when he said that we have purged the caches.

Terraform CLI, in terraform init, also saves itself a local manifest file to remember what it has installed. That file is an implementation detail but in current Terraform is a JSON file living under the .terraform directory, which includes (amongst other things) the "subdirectory" path within the locally-cached module package to use for that particular module, which deals with the fact that a module package can potentially contain many different modules, and Terraform needs to "remember" which one to use when you subsequently run terraform apply.

Since this issue effectively caused the registry to report incorrect subdirectory paths, it seems like some of you now have incorrect subdirectory paths in the local manifest file too. We cannot proactively purge that because it's on your own local computers, but as @pickgr noted one way to deal with it is to run terraform init -upgrade since the -upgrade option effectively forces terraform init to ignore what's in the manifest file. However, -upgrade also causes Terraform to ignore .terraform.lock.hcl and might thereby also perform unwanted provider upgrades, if you're relying on the dependency lock file to retain your currently-selected provider versions.

A more "surgical" answer, focusing only on modules, is to delete the .terraform/modules directory and all of its contents before you run terraform init, which will then remove both the JSON manifest file and your local caches of the modules, thereby forcing Terraform to reinstall them from the now-purged remote registry cache, which should hopefully therefore lead to recreating the manifest file with the correct paths.

@arohter
Copy link

arohter commented Aug 14, 2022

Is there some way to tell atlantis to clean its cache directory?

atlantis unlock will clear out the workspace on disk, including .terraform dir.

@llamahunter
Copy link

Is there some way to tell atlantis to clean its cache directory?

atlantis unlock will clear out the workspace on disk, including .terraform dir.

I had tried that, but it seemed to still be poisoning the cache. Or maybe it wasn't 'fixed' yet?

@jbardin
Copy link
Member

jbardin commented Aug 29, 2022

Closing since the registry issue was resolved and there have been no further incident reports.

@jbardin jbardin closed this as completed Aug 29, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed a Terraform Core team member has reproduced this issue registry
Projects
None yet
Development

No branches or pull requests