Avoid overriding the specified backend configuration in the user's input #288

loheagn · 2022-03-23T06:53:06Z

This issue discusses how to detect the backend configuration from the user's input Terraform configuration, which could be a piece of hcl code inlined in the terraform.core.oam.dev/Configuration yaml file or a few text files in a remote git repository, to avoid overriding the specified backend configuration in the user's input with the default k8s configuration.

As the terraform init && terraform apply will be split into terraform init and terraform apply two stages in #284 , we may be able to use the following method to solve the above mentioned problem.

Do nothing about the backend configuration in the rendering stage.
If terraform init executes successfully, then we can perform the following steps before terraform apply :
1. Check if there is a terraform.tfstate file in the .terraform directory.
2. If there is, it means that the user's input configuration contains a valid backend statement. It's not necessary to add the default k8s backend configuration to the user's input and we can directly do terraform apply. (Actually, the terraform.tfstate file is a json file which contains a backend block, we can parse the json to get more information about the backend configuration if necessary.)
3. If there is not, it means that the user's input configuration doesn't contain a valid backend statement. We need to add a backend.tf file to the working directory and write the default k8s configuration to the file and exec teffaform init again to apply this change. As other work, such as preparing the providers, is done in the first terraform init, the second execution is "light" and it only needs to handle the newly added backend statement.

Compared with the previous solution (which detects the backend configuration in the rendering stage):

Advantages

Avoid executing terraform init from scratch twice.
It doesn't matter if the configuration is inline code or a separate git repository.

Disadvantages

Have to maintain another image (we need to add extra logic to the official Terraform image).

The text was updated successfully, but these errors were encountered:

zzxwill · 2022-03-23T07:02:57Z

What will happen, if a variable of the configuration changes? The job will be deleted.
terraform init logic is in an initContainer, will you use a normal container to perform the action?

loheagn · 2022-03-23T07:14:58Z

What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

loheagn · 2022-03-23T07:17:55Z

terraform init logic is in an initContainer, will you use a normal container to perform the action?

I think initContainer is ok.

zzxwill · 2022-03-23T07:31:21Z

What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

loheagn · 2022-03-23T08:43:00Z

What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

Sorry, I can't get your point. Do you mean that we should restore the terraform state before executing terraform init ?

zzxwill · 2022-03-23T08:59:09Z

What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

Sorry, I can't get your point. Do you mean that we should restore the terraform state before executing terraform init ?

I mean the backend might lose, and a new database might get provisioned if you reassign the backend again.

loheagn · 2022-03-23T09:17:48Z

In my opinion, the state is stored in remote backend in most cases except the local type. Should we backup the state manually after terraform apply or terraform destory using command like terraform state pull, and restore it (push the state to the backend) before creating the next job?

zzxwill · 2022-03-24T03:46:22Z

In my opinion, the state is stored in remote backend in most cases except the local type. Should we backup the state manually after terraform apply or terraform destory using command like terraform state pull, and restore it (push the state to the backend) before creating the next job?

Correct. But how do you plan to back up the state as we might have various backend types?

loheagn · 2022-03-27T13:03:02Z

Sorry for the late reply. I draw a flowchart to show how the terraform-controller job works. The newly added steps are labeled and start with New: .

graph TD
    A[terraform-controller job starts] --> B(initContainer0: copy the confguration from configmap to the workDir)
    B --> C(initContainer1: optional, clone git repo to the workDir)
    C --> D(initContainer2: `terraform init`)
    D --> E{initContainer2: check if there is a valid backend configuration}
    E --> |Yes| J(New: Container: `terraform sate push`, restore the terraform state)
    J --> F(Container: `terraform apply` or `terraform destroy`)
    E --> |No| G(New: initContainer2: create backend.tf in the workdir, and write official k8s backend configuration to it)
    G --> H(New: initContainer2: exec `terraform init` again)
    H --> J
    F --> I(New: Container: `terraform state pull`, store it into the backup secret)
    I --> K[terraform-controller job ends]

Here are more details:

We needn't handle the backend configuration (if exists) in the rendering stage, instead we can detect the backend configuration after the first terraform init and decide if it's necessary to add the official kubernetes backend configuration.
Backup. We can use the terraform state pull command to fetch the current version of the terraform state and store it into the "backup secret" after terraform apply or terraform destroy. The terraform state pull command can pull the state (a json string) and output it to the stdout, no matter what the backend type is.
Restore. Before the next terraform apply or terraform destroy is executed, the state stored in the backend may be lost, so we can use the terraform state push to push the backup state (stored in secets) to the backend to restore the enveriment. The terraform state push command works well among different backup types, even if the backend type changes between "backup" and "resotre".

The above is all the details I can think of. If there is something I didn't consider, or there are some mistakes, please let me know :).

zzxwill · 2022-04-06T07:07:54Z

Here are some concerns:

When "New: Container: terraform sate push, restore the terraform state" executes, terraform apply` haven't started, is there any real resources states backed up as there won't be any cloud resources provisioned at the moment?
What will happen if "Container: terraform apply or terraform destroy" failed and the state hasn't been backed up.
Were the places of state pull and state push wrong in the flow?

loheagn · 2022-04-06T10:31:53Z

Thanks for the reply! @zzxwill

It's certain that we needn't to push the backup state to the remote backend before the first terraform apply. But if the configuration is changed, the terraform-controller will delete the job and restart a new one (I got it from this comment. If I'm wrong please tell me). In this case, it's necessary to sync the state to the remote backend.
Because the terraform state pull and terraform apply or terraform destroy will be executed in the same container, I think we can handle the error and pull the state anyway.
Actually, state push will not be executed before the first state pull. But the flowchart demonstrates all possible cases, and the state push will be executed before terraform apply if there are some cloud resources provisioned. (Again, if I misunderstand how the job works, please tell me :).

zzxwill · 2022-04-07T02:45:17Z

Thanks for the reply! @zzxwill

It's certain that we needn't to push the backup state to the remote backend before the first terraform apply. But if the configuration is changed, the terraform-controller will delete the job and restart a new one (I got it from this comment. If I'm wrong please tell me). In this case, it's necessary to sync the state to the remote backend.

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

Because the terraform state pull and terraform apply or terraform destroy will be executed in the same container, I think we can handle the error and pull the state anyway.

How did you know terraform apply or destroy completed, hit issues in the container?

Actually, state push will not be executed before the first state pull. But the flowchart demonstrates all possible cases, and the state push will be executed before terraform apply if there are some cloud resources provisioned. (Again, if I misunderstand how the job works, please tell me :).

How does terraform push work with the natural state pushing?

The terraform state push command is used to manually upload a local state file to remote state. This command also works with local state.

This command should rarely be used. It is meant only as a utility in case manual intervention is necessary with the remote state.

loheagn · 2022-04-07T05:27:56Z

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

I think we can determine whether it's the first time of terraform apply by whether there is a backup secret or not. If there's not a secret labeled by the configuration, it's no need to exec terraform state push.

How did you know terraform apply or destroy completed, hit issues in the container?

I find that the terraform apply or destory will be executed in the container oam-dev/docker-terraform in the current design. I think we can add more logic to the conatienr and use go code to call the terraform binary. In this way, we can handle the potential errors.

How does terraform push work with the natural state pushing?

Sorry, but what do you mean by "the natural state pushing"?

zzxwill · 2022-04-08T07:04:36Z

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

I think we can determine whether it's the first time of terraform apply by whether there is a backup secret or not. If there's not a secret labeled by the configuration, it's no need to exec terraform state push.

What's a backup secret?

How did you know terraform apply or destroy completed, hit issues in the container?

I find that the terraform apply or destory will be executed in the container oam-dev/docker-terraform in the current design. I think we can add more logic to the conatienr and use go code to call the terraform binary. In this way, we can handle the potential errors.

That works. But some users use their customed terraform container, in which they embedded their private Terraform provider. In this way, we will lose the feature.

How does terraform push work with the natural state pushing?

Sorry, but what do you mean by "the natural state pushing"?

After terraform apply, terraform itself will do state pushing.

zzxwill · 2022-04-08T07:11:13Z

@loheagn With our deep discussion, we find it's complex to figure out whether a remote HCL contains Terraform backend or not, and there is a high probability that the proposal will lead to the instability of our simple terraform apply/destroy retry mechanism.

Here is a possible solution:

We ask end-users to tell us whether there is a custom Terraform backend and what it is in the Configuration spec.

What's your idea?

loheagn · 2022-04-08T07:23:37Z

We ask end-users to tell us whether there is a custom Terraform backend and what it is in the Configuration spec.

In this way, when should we backup and restore the terraform state?

zzxwill · 2022-04-08T07:25:31Z

We don't manually backup and restore. If end-users don't specifically set a backend, we set the default Kubernetes backend, or we use the customed one. Terraform itself will automatically keep the state as it does now.

loheagn · 2022-04-08T07:32:55Z

Okay, I understand. Actually, I was confused as to why we should backup and restore manually the terraform state as the Terraform will store its state in the backend itself……

So, the second point of the LFX issue #239 doesn't seem necessary, does it?

zzxwill · 2022-04-08T07:43:51Z

Okay, I understand. Actually, I was confused as to why we should back up and restore manually the terraform state as the Terraform will store its state in the backend itself……

So, the second point of the LFX issue #239 doesn't seem necessary, does it?

My bad. The second point is essential. It means to backup Terraform Backend state to the external storage system, like AWS S3, Alibaba Cloud OSS, or some local storage system. And we need to allow end-users to restore the state. These will do great help if a disaster happened like a crash of the k8s cluster.

loheagn · 2022-04-08T07:58:12Z

OK. So we also need to provide some options in the configuration yaml for the end-users to configure there state-backup-location. And, should the “backup and restore” actions happen only when the end-users choose to use the default official k8s backend, or should we backup the backend state whatever backend type they use?

loheagn · 2022-04-10T14:40:07Z

Hi, @zzxwill I updated the flowchart:

graph TD
    A[terraform controller starts] --> L(pre-check and generate the configuration meta)
    L --> T(New: Optional: read the backup tf state and store it into a configmap)
    T --> M[start job]
    M --> B(initContainer0: copy the confguration from configmap to the workDir)
    B --> C(initContainer1: optional, clone git repo to the workDir)
    C --> D(initContainer2: terraform init)
    D --> E(New: Optional: initContainer3: mount the configmap which contains the backup tf state and push the backup data to the remote backend useing `terraform state push`)
    E --> F(Container: terraform apply or terraform destroy)
    F --> I(New: Container: terraform state pull, store it into the backup secret)
    I --> K[job ends]
    K --> N{New: Is the job successful?}
    N --> |Y| O{New: Is the backend type the official kubernetes backend?}
    O --> |Y| P(read the tf state from the state secret)
    O --> |N| Q(New: create a working dir, and add a backend.tf file which contains the backend configuration code to the dir)
    Q --> R(New: execute `terraform init` and `terraform pull`, and read the tf state from the stdout)
    R --> S(New: store the tf state json to the location specified by the user in the configuration yaml file)
    P --> S

The backup action should happen at the same time as the getTFOutputs, after the job is Available.
1. If the backend type is the official k8s type, we can read the state directly from the backend secret.
2. Otherwise, we should create a working dir, and prepare the base terraform enveriment by adding the backend.tf and exectuing terraform init, then we can execute terraform state pull to get the tf state from the remote backend. But if the user use the local type, this method will not work.
I don't think the restore action should happen every time the job starts, this action should only happen when the end-user specifie to restore the backup tf state to the remote backend in the configuration yaml file. So, we can use another initContainer to execute the terraform state push command to recover the remote backend.

zzxwill · 2022-04-11T02:18:01Z

@loheagn Let's open another issue to track the backup and restore discussion.

Generally, backup and restore is a system level capability. Users don't need to set anything in a Configuration.

loheagn · 2022-04-11T03:42:21Z

OK. So, the concluation is that, we should allow the end-users to specifie the backend configuration when they use git configurations, and the specification may like the following:

apiVersion: terraform.core.oam.dev/v1beta1
kind: Configuration
metadata:
  name: alibaba-rds-mysql-hcl
spec:
  remote: https://github.com/kubevela-contrib/terraform-modules.git
  path: alibaba/rds

  # backend specification example for git type
  backend: 
    type: s3
    config:
      bucket: "mybucket"
      key: "path/to/my/key"
      region: "my-region"

  variable:
    instance_name: "poc"
    account_name: "oamtest"
    password: "Xyfff83jfewGGfaked"
    security_ips:
      - "0.0.0.0/0"
      - "192.168.1.34"

  writeConnectionSecretToRef:
    name: rds-conn
    namespace: default

If the end-users use the inline HCL configuration, we will parse the HCL code manually and detect the backend declaration in the rendering stage.

If we can not detect any backend declarations, we will use the default kubernetes backend type.

If there is no problem with this proposal, I think I can make a PR to implement it first and then we can discuss the details about the backup and restore.

zzxwill · 2022-04-11T06:50:44Z

The sample of backend looks good to me, which is more native. While Terraform users tends to set the backend as below.

terraform {
  backend "s3" {
    bucket = "mybucket"
    key    = "path/to/my/key"
    region = "us-east-1"
  }
}

So do you think we should also support the inline way to set a backend?

loheagn · 2022-04-11T06:55:02Z

I think it's ok. We can support both two ways.

Another question, if the user provid the backend declaration both in the inline hcl code and in the configuration yaml directly, what's the priority?

zzxwill · 2022-04-11T07:09:53Z

I think it's ok. We can support both two ways.

Another question, if the user provid the backend declaration both in the inline hcl code and in the configuration yaml directly, what's the priority?

spec.backend comes first as it's users' intention. And it would be better if we throw the warning in the Configuration status.

loheagn · 2022-04-11T07:13:11Z

Get it. Thanks for your guidance! @zzxwill

zzxwill mentioned this issue Mar 23, 2022

[LFX Internship] Management of Terraform state #239

Open

loheagn mentioned this issue Apr 12, 2022

Support custom Terraform backends #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid overriding the specified backend configuration in the user's input #288

Avoid overriding the specified backend configuration in the user's input #288

loheagn commented Mar 23, 2022

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 24, 2022

loheagn commented Mar 27, 2022 •

edited

zzxwill commented Apr 6, 2022

loheagn commented Apr 6, 2022

zzxwill commented Apr 7, 2022

loheagn commented Apr 7, 2022

zzxwill commented Apr 8, 2022

zzxwill commented Apr 8, 2022

loheagn commented Apr 8, 2022

zzxwill commented Apr 8, 2022

loheagn commented Apr 8, 2022 •

edited

zzxwill commented Apr 8, 2022 •

edited

loheagn commented Apr 8, 2022

loheagn commented Apr 10, 2022 •

edited

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022 •

edited

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022

Avoid overriding the specified backend configuration in the user's input #288

Avoid overriding the specified backend configuration in the user's input #288

Comments

loheagn commented Mar 23, 2022

Advantages

Disadvantages

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 23, 2022

loheagn commented Mar 23, 2022

zzxwill commented Mar 24, 2022

loheagn commented Mar 27, 2022 • edited

zzxwill commented Apr 6, 2022

loheagn commented Apr 6, 2022

zzxwill commented Apr 7, 2022

loheagn commented Apr 7, 2022

zzxwill commented Apr 8, 2022

zzxwill commented Apr 8, 2022

loheagn commented Apr 8, 2022

zzxwill commented Apr 8, 2022

loheagn commented Apr 8, 2022 • edited

zzxwill commented Apr 8, 2022 • edited

loheagn commented Apr 8, 2022

loheagn commented Apr 10, 2022 • edited

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022 • edited

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022

zzxwill commented Apr 11, 2022

loheagn commented Apr 11, 2022

loheagn commented Mar 27, 2022 •

edited

loheagn commented Apr 8, 2022 •

edited

zzxwill commented Apr 8, 2022 •

edited

loheagn commented Apr 10, 2022 •

edited

loheagn commented Apr 11, 2022 •

edited