Skip to content

Refactoring of module source addresses and module installation #28854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 3, 2021

Conversation

apparentlymart
Copy link
Contributor

@apparentlymart apparentlymart commented Jun 1, 2021

It's been a long while since we gave close attention to the codepaths for module source address parsing and external module package installation. Due to their age, these codepaths often diverged from our modern practices such as representing address types in the addrs package, and encapsulating package installation details only in a particular location.

In particular, this refactor makes source address parsing a separate step from module installation, which therefore makes the result of that parsing available to other Terraform subsystems which work with the configuration representation objects. The start of that is included here as some new EntersNewPackage methods in the configs package for recognizing the difference between a local module call and a call that enters a new remote package; we'll make more use of that in later PRs.

This also presented the opportunity to better encapsulate our use of go-getter into a new package getmodules (echoing getproviders), which is intended to be the only part of Terraform that directly interacts with go-getter.

This is largely just a refactor of the existing functionality into a new code organization, but there is one notable change in behavior here: the source address parsing now happens during configuration loading rather than module installation, which may cause errors about invalid addresses to be returned in different situations than before. That counts as backward compatible because we only promise to remain compatible with configurations that are "valid", which means that they can be initialized, planned, and applied without any errors. This doesn't introduce any new error cases, and instead just makes a pre-existing error case be detected earlier.

Our module registry client is still using its own special module address type from registry/regsrc for now, with a small shim from the new addrs.ModuleSourceRegistry type. Hopefully in a later commit we'll also rework the registry client to work with the new address type, but this PR is already big enough as it is.

I tried to break this into some smaller commits for easier commit-by-commit review, although there is a big meaty middle where I had to change a bunch of things all at once in order for it to keep compiling. That middle commit should just be moving things around and not materially changing behavior (aside from parsing earlier as mentioned above).

@apparentlymart apparentlymart requested a review from a team June 1, 2021 19:43
@apparentlymart apparentlymart self-assigned this Jun 1, 2021
As the comment notes, this hostname is the default for provide source
addresses. We'll shortly be adding some address types to represent module
source addresses too, and so we'll also have DefaultModuleRegistryHost
for that situation.

(They'll actually both contain the the same hostname, but that's a
coincidence rather than a requirement.)
This new package aims to encapsulate all of our interactions with
go-getter to fetch remote module packages, to ensure that the rest of
Terraform will only use the small subset of go-getter functionality that
our modern module installer uses.

In older versions of Terraform, go-getter was the entire implementation
of module installation, but along the way we found that several aspects of
its design are poor fit for Terraform's needs, and so now we're using it
as just an implementation detail of Terraform's handling of remote module
packages only, hiding it behind this wrapper API which exposes only
the services that our module installer needs.

This new package isn't actually used yet, but in a later commit we will
change all of the other callers to go-getter to only work indirectly
through this package, so that this will be the only package that actually
imports the go-getter packages.
We've previously had the syntax and representation of module source
addresses pretty sprawled around the codebase and intermingled with other
systems such as the module installer.

I've created a factored-out implementation here with the intention of
enabling some later refactoring to centralize the address parsing as part
of configuration decoding, and thus in turn allow the parsing result to
be seen by other parts of Terraform that interact with configuration
objects, so that they can more robustly handle differences between local
and remote modules, and can present module addresses consistently in the
UI.
It's been a long while since we gave close attention to the codepaths for
module source address parsing and external module package installation.
Due to their age, these codepaths often diverged from our modern practices
such as representing address types in the addrs package, and encapsulating
package installation details only in a particular location.

In particular, this refactor makes source address parsing a separate step
from module installation, which therefore makes the result of that parsing
available to other Terraform subsystems which work with the configuration
representation objects.

This also presented the opportunity to better encapsulate our use of
go-getter into a new package "getmodules" (echoing "getproviders"), which
is intended to be the only part of Terraform that directly interacts with
go-getter.

This is largely just a refactor of the existing functionality into a new
code organization, but there is one notable change in behavior here: the
source address parsing now happens during configuration loading rather
than module installation, which may cause errors about invalid addresses
to be returned in different situations than before. That counts as
backward compatible because we only promise to remain compatible with
configurations that are _valid_, which means that they can be initialized,
planned, and applied without any errors. This doesn't introduce any new
error cases, and instead just makes a pre-existing error case be detected
earlier.

Our module registry client is still using its own special module address
type from registry/regsrc for now, with a small shim from the new
addrs.ModuleSourceRegistry type. Hopefully in a later commit we'll also
rework the registry client to work with the new address type, but this
commit is already big enough as it is.
Now that we (in the previous commit) refactored how we deal with module
sources to do the parsing at config loading time rather than at module
installation time, we can expose a method to centralize the determination
for whether a particular module call (and its resulting Config object)
enters a new external package.

We don't use this for anything yet, but in later commits we will use this
for some cross-module features that are available only for modules
belonging to the same package, because we assume that modules grouped
together in a package can change together and thus it's okay to permit a
little more coupling of internal details in that case, which would not
be appropriate between modules that are versioned separately.
We have some tests in this package that install real modules from the real
registry at registry.terraform.io. Those tests were written at an earlier
time when the registry's behavior was to return the URL of a .tar.gz
archive generated automatically by GitHub, which included an extra level
of subdirectory that would then be reflected in the paths to the local
copies of these modules.

GitHub started rate limiting those tar archives in a way that Terraform's
module installer couldn't authenticate to, and so the registry switched
to returning direct git repository URLs instead, which don't have that
extra subdirectory and so the local paths on disk now end up being a
little different, because the actual module directories are at a different
subdirectory of the package.
@apparentlymart apparentlymart force-pushed the f-module-call-is-local branch from 2939109 to 4b34cc3 Compare June 2, 2021 18:43
Previously we had a separation between ModuleSourceRemote and
ModulePackage as a way to represent within the type system that there's an
important difference between a module source address and a package address,
because module packages often contain multiple modules and so a
ModuleSourceRemote combines a ModulePackage with a subdirectory to
represent one specific module.

This commit applies that same strategy to ModuleSourceRegistry, creating
a new type ModuleRegistryPackage to represent the different sort of
package that we use for registry modules. Again, the main goal here is
to try to reflect the conceptual modelling more directly in the type
system so that we can more easily verify that uses of these different
address types are correct.

To make use of that, I've also lightly reworked initwd's module installer
to use addrs.ModuleRegistryPackage directly, instead of a string
representation thereof. This was in response to some earlier commits where
I found myself accidentally mixing up package addresses and source
addresses in the installRegistryModule method; with this new organization
those bugs would've been caught at compile time, rather than only at
unit and integration testing time.

While in the area anyway, I also took this opportunity to fix some
historical confusing names of fields in initwd.ModuleInstaller, to be
clearer that they are only for registry packages and not for all module
source address types.
@apparentlymart apparentlymart merged commit 51b0aee into main Jun 3, 2021
@apparentlymart apparentlymart deleted the f-module-call-is-local branch June 3, 2021 15:50
@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2021

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants