Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terragrunt IAC Engine Plugin System #3103

Open
2 tasks
yhakbar opened this issue Apr 26, 2024 · 1 comment
Open
2 tasks

Terragrunt IAC Engine Plugin System #3103

yhakbar opened this issue Apr 26, 2024 · 1 comment
Labels
pending-decision Pending decision from maintainers rfc Request For Comments

Comments

@yhakbar
Copy link
Contributor

yhakbar commented Apr 26, 2024

Summary

Introduce the ability to integrate with plugins to drive custom behavior in the underlying IAC tool orchestrated by Terragrunt (like OpenTofu or Terraform).

Motivation

Users have been lacking two significant capabilities that are addressed by this RFC:

  1. The ability to customize the usage of tofu and terraform when called by Terragrunt.

    Users have been relying on ensuring that a particular versions of tools have been set prior to executing terragrunt or utilizing a shim to alter the execution of the underlying IAC tool.

  2. The ability to alter the context of the IAC execution separate from the Terragrunt execution.

    So far, there has been no way to isolate the IAM access that the underlying IAC tool has from the access that Terragrunt has. The IAC tool has also had to run in the same compute environment and on the same filesystem as Terragrunt.
    Heavy users of Terragrunt would like to be able to isolate the compute resources allocated to Terragrunt from the compute allocated to the underlying IAC tool so that they can fan out IAC updates across multiple instances/containers/pods.

Proposal

Allow users to optionally specify an IAC engine, which will control how the underlying IAC operations like plans, applies, etc will be carried out instead of directly calling the tofu or terraform binaries.

Users will be able to use a configuration block that looks like the following to configure their engine in the relevant terragrunt.hcl:

engine {
   source  = "github.com/acme/terragrunt-plugin-custom-opentofu"
   version = "v0.0.1" # Optionally specify version
   type    = "shared" # Optionally specify the type of plugin
}

The source field would be either the path to a local binary (signified by starting the value with . or /) or a URL pointing to a GitHub repository with a releases page containing an asset that can be used (the appropriate architecture and platform would be guessed based on detected values of the host machine, and can be explicitly set via environment variables).

The optional version field would indicate the git tag associated with the release to download, when the source is not a local binary. Throws an error if set for a local path, and is the latest release by default for remote sources.

The optional type field would indicate the type of plugin used by the engine. The default shared value would indicate that the plugin is a shared library using the Golang plugin package. For performance and simplicity, this will be the first type of plugin supported by this RFC. It's possible that in the future, a secondary type of plugin leveraging HashiCorp's go-plugin would be used with type rpc.

Technical Details

This proposal impacts how and if tofu and terraform get called by Terragrunt.

To support this change, the following will have to be done:

  • Ensure that all calls to tofu/terraform are mediated by some logic that checks if an engine block has been configured, and only directly execute one of the binaries if none have.
  • Ensure that all those calls have public interfaces that plugin authors can use to verify that their usage in engine blocks will work.
  • Create the architecture for downloading assets from GitHub releases using a tool like go-gh, or something equivalent.
  • The assets should have their integrity verified by computing their checksums.
  • Concurrent access to the same plugin should be thought through from an early stage due to the nature of Terragrunt. A locking mechanism to ensure that concurrent attempts to download the same plugin won't result in race conditions.
  • Plugins should be centrally cached, with the location of that cache configurable by users via an environment variable.
  • HCL config parsing will need to be updated to respect the new engine block.
  • When an engine block is configured in the terragrunt.hcl that is used for a terragrunt command, and the type is shared, dynamically fetch, verify, then load the plugin, then execute IAC updates using it.
  • Logging should be introduced to signal to users that control has shifted from Terragrunt to the plugin.
  • Documentation on the engine system should be written up, and include warnings that for shared plugins, panics in the plugin will cause a panic in Terragrunt.

To ensure that this functionality can be developed smoothly with minimal risk of regression, the functionality should be introduced under a feature flag that is enabled by setting the environment variable TG_EXPERIMENTAL_ENGINE=1. Users should be made aware that leveraging this functionality in production is risky while the functionality is being battle tested.

Documentation will need to be authored that demonstrates how to write an IAC Engine plugin and guidance on testing it.

In addition, Gruntwork will host two plugins that will demonstrate how to author plugins following best practices:

  • terragrunt-iac-engine-opentofu
  • terragrunt-iac-engine-terraform

They will execute tofu and terraform in the same way Terragrunt currently does. Users will be able to use the repositories as springboards for their custom implementation of the same functionality.

References:

Press Release

A new engine block allowing you to customize and configure your IAC updates orchestrated by Terragrunt!

To try it out, all you need to do is include the following in your terragrunt.hcl:

engine {
   source  = "github.com/gruntwork-io/terragrunt-iac-engine-opentofu"
}

Due to the fact that this functionality is still experimental, and not recommended for general production usage, set the following environment variable to opt-in to this functionality:

export TG_EXPERIMENTAL_ENGINE=1

Note that this functionality is not currently supported in Windows.

The next time you call Terragrunt, it will dynamically fetch and load the Gruntwork OpenTofu IAC Engine plugin for Terragrunt to use instead of calling OpenTofu directly.

You can find the plugin here.

If you'd like to customize how OpenTofu is used when orchestrated by Terragrunt, feel free to fork the repository and call your own version of the plugin!

Drawbacks

  • This will complicate and potentially introduce regression into core Terragrunt functionality in invoking IAC tools.
  • Additional maintenance burden will be imposed on the maintainers in that IAC Engine plugins will have to remain compatible with default direct invocations of tofu and terraform.
  • The initial version would not be supported in Windows, as the plugin package does not support the platform:

    Plugins are currently supported only on Linux, FreeBSD, and macOS, making them unsuitable for applications intended to be portable. "

  • Troubleshooting issues for users can become more complicated if errors exist in the implementation of their IAC Engines, rather than anything Terragrunt ships.
  • Ensuring that the plugin system works well introduces an entirely new source of burden for how Terragrunt is maintained.

Alternatives

  • Start with the HashiCorp go-plugin approach before starting with the Golang plugin package approach. This was not chosen due to the performance implications of interfacing with the plugin over RPC instead of loading the symbols and using them directly from the shared library. We are sacrificing increased platform compatibility by doing this, due to the fact that Windows is not supported in this approach. We are also sacrificing a whole host of advantages listed in the go-plugin README.md, which includes fault isolation when panics occur in the plugin.
  • Avoiding introducing any plugin system at all. This was not chosen due to the scaling limits customers are reaching with our current architecture. They have a need for the ability to have more control over how IAC execution works, so it's deemed worth it to explore this avenue.

Migration Strategy

This shouldn't result in any need for adjustments on the behalf of customers for their existing code bases to be compatible.

IAC Engines should remain an optional feature of Terragrunt for the foreseeable future.

Unresolved Questions

  • Would the majority of users prefer that we start with the rpc plugin type instead of the shared plugin type? If so, please make your voice heard in the comments on this RFC.
  • How much time would it take to build out this system? We can incrementally release functionality behind the TG_EXPERIMENTAL_ENGINE feature flag to release incomplete functionality early. There may be a long waiting period before we consider this functionality ready for general audiences, however.
  • What is the appetite users have for building their own IAC Engine plugins? Is this something that would be used by the vast minority of users, and how much would this impact their workflows?
  • What best practices in Golang plugin architectures can we be sure to adopt out of the gate to ensure that this feature is successful?

References

Proof of Concept Pull Request

No response

Support Level

  • I have Terragrunt Enterprise Support
  • I am a paying Gruntwork customer

Customer Name

No response

@yhakbar yhakbar added rfc Request For Comments pending-decision Pending decision from maintainers labels Apr 26, 2024
@yhakbar
Copy link
Contributor Author

yhakbar commented May 9, 2024

Some feedback has been shared offline regarding the performance implications of introducing this plugin system that stemmed from a lack of clarity in this RFC regarding the difference between the shared and rpc plugin types.

Shared

The shared type of plugin leverages the built-in Golang plugin package. This kind of plugin is a shared library (typically having the extension .so) that would be dynamically loaded by the running Terragrunt process, and have its exported functions called directly from the Terragrunt process.

There would be no Inter-Process Communication (IPC) between Terragrunt and a second process running along with Terragrunt, and it would be largely equivalent to calling the functions from directly within the Terragrunt binary from a performance and usage perspective.

The downsides of this approach are that it requires that the plugin be written in Golang and one that is compatible with the version of Golang used to compile Terragrunt, see the warnings here. It also prevents most fault isolation of panics in the plugin, etc, as the plugin would be running in the same process as Terragrunt.

RPC

The rpc type of plugin leverages the HashiCorp go-plugin package. It is how provider plugins work in OpenTofu and Terraform. This type of plugin is spun up as a secondary process, and Terragrunt would establish a client - server connection with the plugin.

The advantages of this approach are that the plugin can be written in languages other than Golang, as long as they have good support for the protocol used by the plugin system (e.g. gRPC), and it allows for panics to happen in a secondary process, making it easier to prevent blow-ups in the engine from impacting the Terragrunt process. You can see a number of other advantages here.

The downsides of this approach are that there is a detectable impact to performance. There can be significant overhead in spinning up one or more Engine plugins, which then spin up one or more Provider plugins and having all that IPC happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-decision Pending decision from maintainers rfc Request For Comments
Projects
None yet
Development

No branches or pull requests

1 participant