Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform init: add warning and guidance when lock file is incomplete #31399

Merged
merged 5 commits into from Jul 20, 2022

Conversation

liamcervante
Copy link
Member

@liamcervante liamcervante commented Jul 7, 2022

This PR addresses part of the concerns raised in #29958

It also builds on the change in #31389

terraform init will now print a warning when any provider will only have a single checksum written into the lock file. The warning will include a suggestion that terraform providers lock could fix any potential problems.

My understanding is that this is usually because terraform has only been able to generate a checksum for the current architecture locally, and has otherwise been unable to download the checksums for other architectures. Usually this is a result of initializing through a file system or remote mirror or a local package cache instead of the registry directly.

terraform providers lock will however be able to pull expected checksums for all architectures so terraform lets the user know about the potential fix in case they see this problem later.

@liamcervante liamcervante marked this pull request as ready for review July 7, 2022 14:40
@liamcervante liamcervante requested a review from a team July 7, 2022 14:41
Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable heuristic since it seems unlikely (though technically possible) for a provider to really have only one checksum. The following would both have to be true for that to occur:

  1. The provider only targets exactly one platform. This is rare but not without precedent; for example, a weird provider that interacted with a local OS API instead of a network service would be naturally specific to the OS it's targeting, and its author might choose to only target one architecture too. (though that's admittedly becoming increasingly rare with the increasing popularity of arm64 in addition to amd64)
  2. We'd already updated the registry protocol to use the new h1: hashing scheme instead of the legacy zh: hashing scheme, and updated the Terraform CLI provider installer to make use of that by not including the legacy hashes in the lock file at all.

Point 2 here is the one that I think makes this heuristic totally fine: we'd need to change Terraform CLI's installer to expect to get new-style hashes from an updated version of the registry protocol anyway, and so we could update this heuristic at that time to work in a different way that avoids a false positive.

It does seem like a bit of a shame that our abstraction here is preventing achieving an exact answer: the lower layers of the plugin installation process do know exactly whether they are calculating a checksum locally based on an already-downloaded package or if they are returning the signed checksums from the registry, and so in principle we could be exact here and show this only in the first case, but in practice we've got that all encapsulated in Installer.InstallProviderVersions. I suppose one alternative approach would be to add a new method to InstallerEvents to notify the UI of details of how the installer is populating the depsfile.Locks that it returns, but that does seem like quite a heavy lift just to avoid a heuristic that seems sufficient for right now.


I think then my one remaining concern is how we can "leave ourselves a note" to remember to revisit this heuristic in the event that we make criteria 2 above true in the future, which is something we'd like to do eventually to address various other awkward edges in the lock file mechanism. I wonder if we can find a suitable place inside the getproviders package (which is a component we'd almost definitely be changing in order to change the checksum handling for the registry) to make this visible to a future maintainer.

My first thought was somewhere in the vicinity of getproviders.RegistrySource, but its PackageMeta implementation just thinly wraps the unexported registry client, so I ended up digging deeper and ultimately finding signatureAuthentication.AcceptableHashes as the place where we finally actually parse the checksums from the registry and interpret them as "ziphash" (zh:) checksums.

Cross-referencing comments between two codepaths that are so far removed from each other is a pretty fallible thing to do, of course -- there is quite some chance that we'll update or remove one without updating the other in the future -- but I'm aiming for an increased chance of us noticing that there's a dependency here, rather than a total guarantee.

@liamcervante
Copy link
Member Author

I suppose one alternative approach would be to add a new method to InstallerEvents to notify the UI of details of how the installer is populating the depsfile.Locks that it returns, but that does seem like quite a heavy lift just to avoid a heuristic that seems sufficient for right now.

I've given this a go in #31406. I've a feeling I may have over complicated the logic a bit though, would be interested to see what you think.

Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at your follow-up PR that builds on this one I re-read the warning message and had some notes about it, but GitHub wouldn't let me post that comment over in the other PR because you didn't change those lines there. 😀

I tried to leave a lot of context in the comment about my thought process in the hope of sharing some general notes on how we think about the diagnostic messages in Terraform. This stuff is always subjective, so I've no problem with you changing it if you disagree with exactly how I chose to meet each of my goals once you've reflected on them. 😀

tfdiags.Warning,
"Incomplete validation for providers",
fmt.Sprintf(
"Terraform has only been able to record incomplete checksums for some providers (%s) so you may encounter problems when the lock file is used on machines with a different architecture. This can normally be fixed by running `terraform providers lock -platform=os_arch` for all target architectures.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some copyedits of this warning message which are admittedly subjective but my idea here is firstly to make this "sound like" the Terraform UI voice (which isn't defined strongly, but I think you'll get a sense for it over time) and also to give a little more information about what Terraform just did:

Warning: Incomplete lock file information for providers

Due to your customized provider installation methods, Terraform was forced
to calculate lock file checksums locally for the following providers:
  - hashicorp/aws
  - hashicorp/null
  - somethingelse/somethingelse

The current .terraform.lock.hcl file only includes checksums for
darwin_arm64, so Terraform running on another platform will fail to
install these providers.

To calculate additional checksums for another platform, run:
  terraform providers lock -platform=linux_amd64
(where linux_amd64 is the platform to generate)

My thought process as I wrote this was:

  • Subjectively, feels nicer to enumerate the providers as a "bullet list", rather than as a comma-separated list, because the list could potentially be quite long and contain long strings like foo.example.com/bleep-bloop/bar-baz that probably won't word-wrap nicely.

  • State that this is caused by the customized installation methods, which hopefully hints at a potential way to avoid this problem altogether (use the default settings) but without implying that it's wrong to use customized settings, since there are of course various good reasons to do so.

  • Mention the lock file name specifically so it's easier to see that we're talking about the same file that Terraform presumably also mentioned elsewhere in this output.

  • Mention the platform the user is currently using, honestly mainly just to give another example of what a platform string might look like, since this might be the user's first exposure to them.

  • Show the copy-pasteable command line on a line of its own, to avoid readers being confused about whether the delimiters are important. (This is our typical style with other error and warning messages which include commands to run).

    I also kinda arbitrarily chose to use linux_amd64 as a "real example" of a platform string, rather than the "os_arch" placeholder, in the hope of making it easier to see what to do by showing another actual platform string. I chose linux_amd64 in recognition that a typical way this problem manifests is someone writing a configuration originally on a Windows or macOS workstation but then trying to run it "for real" in automation running in Linux containers.

  • Note that our diagnostic renderer in the CLI will automatically wrap any line of text that has no leading whitespace, assuming it's a paragraph, but will not wrap lines which start with leading whitespace. In the above example, that applies to both the "bullet points" for the provider addresses and to the copy-pasteable command line at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! All makes sense to me.

* create installer event for tracking provider lock hashes

* address comments

* fix tests
Copy link
Member

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks for trying out the other approach with the UI event hooks.

internal/command/init.go Outdated Show resolved Hide resolved
ui := new(cli.MockUi)
ui := cli.NewMockUi()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've discovered another example of the historical mistake of not properly initializing the mock UI! This does still come up from time to time, and indeed this is the way we typically fix it. (By which I mean: this probably won't be the last time you make a change like this!)

Co-authored-by: Martin Atkins <mart@degeneration.co.uk>
@liamcervante liamcervante added the 1.2-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged label Jul 20, 2022
@liamcervante liamcervante merged commit 83e84e5 into main Jul 20, 2022
@github-actions
Copy link

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1.2-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants