Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overriding the specified backend configuration in the user's input #288

Open
loheagn opened this issue Mar 23, 2022 · 27 comments
Open

Comments

@loheagn
Copy link
Contributor

loheagn commented Mar 23, 2022

This issue discusses how to detect the backend configuration from the user's input Terraform configuration, which could be a piece of hcl code inlined in the terraform.core.oam.dev/Configuration yaml file or a few text files in a remote git repository, to avoid overriding the specified backend configuration in the user's input with the default k8s configuration.

As the terraform init && terraform apply will be split into terraform init and terraform apply two stages in #284 , we may be able to use the following method to solve the above mentioned problem.

  1. Do nothing about the backend configuration in the rendering stage.

  2. If terraform init executes successfully, then we can perform the following steps before terraform apply :

    1. Check if there is a terraform.tfstate file in the .terraform directory.

    2. If there is, it means that the user's input configuration contains a valid backend statement. It's not necessary to add the default k8s backend configuration to the user's input and we can directly do terraform apply. (Actually, the terraform.tfstate file is a json file which contains a backend block, we can parse the json to get more information about the backend configuration if necessary.)

    3. If there is not, it means that the user's input configuration doesn't contain a valid backend statement. We need to add a backend.tf file to the working directory and write the default k8s configuration to the file and exec teffaform init again to apply this change. As other work, such as preparing the providers, is done in the first terraform init, the second execution is "light" and it only needs to handle the newly added backend statement.

Compared with the previous solution (which detects the backend configuration in the rendering stage):

Advantages

  1. Avoid executing terraform init from scratch twice.

  2. It doesn't matter if the configuration is inline code or a separate git repository.

Disadvantages

  1. Have to maintain another image (we need to add extra logic to the official Terraform image).
@zzxwill
Copy link
Collaborator

zzxwill commented Mar 23, 2022

  1. What will happen, if a variable of the configuration changes? The job will be deleted.
  2. terraform init logic is in an initContainer, will you use a normal container to perform the action?

@loheagn
Copy link
Contributor Author

loheagn commented Mar 23, 2022

  1. What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

@loheagn
Copy link
Contributor Author

loheagn commented Mar 23, 2022

  1. terraform init logic is in an initContainer, will you use a normal container to perform the action?

I think initContainer is ok.

@zzxwill
Copy link
Collaborator

zzxwill commented Mar 23, 2022

  1. What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

@loheagn
Copy link
Contributor Author

loheagn commented Mar 23, 2022

  1. What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

Sorry, I can't get your point. Do you mean that we should restore the terraform state before executing terraform init ?

@zzxwill
Copy link
Collaborator

zzxwill commented Mar 23, 2022

  1. What will happen, if a variable of the configuration changes? The job will be deleted.

Do you mean the deletion happened when updating the configuration? In my opinion, we need to executeterraform init and detect the backend configuration again.

A database is successfully provisioned, its password is changed. Will a new database will be created?

Sorry, I can't get your point. Do you mean that we should restore the terraform state before executing terraform init ?

I mean the backend might lose, and a new database might get provisioned if you reassign the backend again.

@loheagn
Copy link
Contributor Author

loheagn commented Mar 23, 2022

In my opinion, the state is stored in remote backend in most cases except the local type. Should we backup the state manually after terraform apply or terraform destory using command like terraform state pull, and restore it (push the state to the backend) before creating the next job?

@zzxwill
Copy link
Collaborator

zzxwill commented Mar 24, 2022

In my opinion, the state is stored in remote backend in most cases except the local type. Should we backup the state manually after terraform apply or terraform destory using command like terraform state pull, and restore it (push the state to the backend) before creating the next job?

Correct. But how do you plan to back up the state as we might have various backend types?

@loheagn
Copy link
Contributor Author

loheagn commented Mar 27, 2022

Sorry for the late reply. I draw a flowchart to show how the terraform-controller job works. The newly added steps are labeled and start with New: .

graph TD
    A[terraform-controller job starts] --> B(initContainer0: copy the confguration from configmap to the workDir)
    B --> C(initContainer1: optional, clone git repo to the workDir)
    C --> D(initContainer2: `terraform init`)
    D --> E{initContainer2: check if there is a valid backend configuration}
    E --> |Yes| J(New: Container: `terraform sate push`, restore the terraform state)
    J --> F(Container: `terraform apply` or `terraform destroy`)
    E --> |No| G(New: initContainer2: create backend.tf in the workdir, and write official k8s backend configuration to it)
    G --> H(New: initContainer2: exec `terraform init` again)
    H --> J
    F --> I(New: Container: `terraform state pull`, store it into the backup secret)
    I --> K[terraform-controller job ends]

Here are more details:

  1. We needn't handle the backend configuration (if exists) in the rendering stage, instead we can detect the backend configuration after the first terraform init and decide if it's necessary to add the official kubernetes backend configuration.

  2. Backup. We can use the terraform state pull command to fetch the current version of the terraform state and store it into the "backup secret" after terraform apply or terraform destroy. The terraform state pull command can pull the state (a json string) and output it to the stdout, no matter what the backend type is.

  3. Restore. Before the next terraform apply or terraform destroy is executed, the state stored in the backend may be lost, so we can use the terraform state push to push the backup state (stored in secets) to the backend to restore the enveriment. The terraform state push command works well among different backup types, even if the backend type changes between "backup" and "resotre".

The above is all the details I can think of. If there is something I didn't consider, or there are some mistakes, please let me know :).

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 6, 2022

Here are some concerns:

  • When "New: Container: terraform sate push, restore the terraform state" executes, terraform apply` haven't started, is there any real resources states backed up as there won't be any cloud resources provisioned at the moment?
  • What will happen if "Container: terraform apply or terraform destroy" failed and the state hasn't been backed up.
  • Were the places of state pull and state push wrong in the flow?

@loheagn
Copy link
Contributor Author

loheagn commented Apr 6, 2022

Thanks for the reply! @zzxwill

  1. It's certain that we needn't to push the backup state to the remote backend before the first terraform apply. But if the configuration is changed, the terraform-controller will delete the job and restart a new one (I got it from this comment. If I'm wrong please tell me). In this case, it's necessary to sync the state to the remote backend.

  2. Because the terraform state pull and terraform apply or terraform destroy will be executed in the same container, I think we can handle the error and pull the state anyway.

  3. Actually, state push will not be executed before the first state pull. But the flowchart demonstrates all possible cases, and the state push will be executed before terraform apply if there are some cloud resources provisioned. (Again, if I misunderstand how the job works, please tell me :).

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 7, 2022

Thanks for the reply! @zzxwill

  1. It's certain that we needn't to push the backup state to the remote backend before the first terraform apply. But if the configuration is changed, the terraform-controller will delete the job and restart a new one (I got it from this comment. If I'm wrong please tell me). In this case, it's necessary to sync the state to the remote backend.

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

  1. Because the terraform state pull and terraform apply or terraform destroy will be executed in the same container, I think we can handle the error and pull the state anyway.

How did you know terraform apply or destroy completed, hit issues in the container?

  1. Actually, state push will not be executed before the first state pull. But the flowchart demonstrates all possible cases, and the state push will be executed before terraform apply if there are some cloud resources provisioned. (Again, if I misunderstand how the job works, please tell me :).

How does terraform push work with the natural state pushing?

The terraform state push command is used to manually upload a local state file to remote state. This command also works with local state.

This command should rarely be used. It is meant only as a utility in case manual intervention is necessary with the remote state.

@loheagn
Copy link
Contributor Author

loheagn commented Apr 7, 2022

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

I think we can determine whether it's the first time of terraform apply by whether there is a backup secret or not. If there's not a secret labeled by the configuration, it's no need to exec terraform state push.

How did you know terraform apply or destroy completed, hit issues in the container?

I find that the terraform apply or destory will be executed in the container oam-dev/docker-terraform in the current design. I think we can add more logic to the conatienr and use go code to call the terraform binary. In this way, we can handle the potential errors.

How does terraform push work with the natural state pushing?

Sorry, but what do you mean by "the natural state pushing"?

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 8, 2022

So how can you tell when it's the first time of terraform apply or when the configuration is changed. As you know, there is not state when executing terraform apply from scratch.

I think we can determine whether it's the first time of terraform apply by whether there is a backup secret or not. If there's not a secret labeled by the configuration, it's no need to exec terraform state push.

What's a backup secret?

How did you know terraform apply or destroy completed, hit issues in the container?

I find that the terraform apply or destory will be executed in the container oam-dev/docker-terraform in the current design. I think we can add more logic to the conatienr and use go code to call the terraform binary. In this way, we can handle the potential errors.

That works. But some users use their customed terraform container, in which they embedded their private Terraform provider. In this way, we will lose the feature.

How does terraform push work with the natural state pushing?

Sorry, but what do you mean by "the natural state pushing"?

After terraform apply, terraform itself will do state pushing.

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 8, 2022

@loheagn With our deep discussion, we find it's complex to figure out whether a remote HCL contains Terraform backend or not, and there is a high probability that the proposal will lead to the instability of our simple terraform apply/destroy retry mechanism.

Here is a possible solution:

We ask end-users to tell us whether there is a custom Terraform backend and what it is in the Configuration spec.

What's your idea?

@loheagn
Copy link
Contributor Author

loheagn commented Apr 8, 2022

We ask end-users to tell us whether there is a custom Terraform backend and what it is in the Configuration spec.

In this way, when should we backup and restore the terraform state?

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 8, 2022

We don't manually backup and restore. If end-users don't specifically set a backend, we set the default Kubernetes backend, or we use the customed one. Terraform itself will automatically keep the state as it does now.

@loheagn
Copy link
Contributor Author

loheagn commented Apr 8, 2022

Okay, I understand. Actually, I was confused as to why we should backup and restore manually the terraform state as the Terraform will store its state in the backend itself……

So, the second point of the LFX issue #239 doesn't seem necessary, does it?

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 8, 2022

Okay, I understand. Actually, I was confused as to why we should back up and restore manually the terraform state as the Terraform will store its state in the backend itself……

So, the second point of the LFX issue #239 doesn't seem necessary, does it?

My bad. The second point is essential. It means to backup Terraform Backend state to the external storage system, like AWS S3, Alibaba Cloud OSS, or some local storage system. And we need to allow end-users to restore the state. These will do great help if a disaster happened like a crash of the k8s cluster.

@loheagn
Copy link
Contributor Author

loheagn commented Apr 8, 2022

OK. So we also need to provide some options in the configuration yaml for the end-users to configure there state-backup-location. And, should the “backup and restore” actions happen only when the end-users choose to use the default official k8s backend, or should we backup the backend state whatever backend type they use?

@loheagn
Copy link
Contributor Author

loheagn commented Apr 10, 2022

Hi, @zzxwill I updated the flowchart:

graph TD
    A[terraform controller starts] --> L(pre-check and generate the configuration meta)
    L --> T(New: Optional: read the backup tf state and store it into a configmap)
    T --> M[start job]
    M --> B(initContainer0: copy the confguration from configmap to the workDir)
    B --> C(initContainer1: optional, clone git repo to the workDir)
    C --> D(initContainer2: terraform init)
    D --> E(New: Optional: initContainer3: mount the configmap which contains the backup tf state and push the backup data to the remote backend useing `terraform state push`)
    E --> F(Container: terraform apply or terraform destroy)
    F --> I(New: Container: terraform state pull, store it into the backup secret)
    I --> K[job ends]
    K --> N{New: Is the job successful?}
    N --> |Y| O{New: Is the backend type the official kubernetes backend?}
    O --> |Y| P(read the tf state from the state secret)
    O --> |N| Q(New: create a working dir, and add a backend.tf file which contains the backend configuration code to the dir)
    Q --> R(New: execute `terraform init` and `terraform pull`, and read the tf state from the stdout)
    R --> S(New: store the tf state json to the location specified by the user in the configuration yaml file)
    P --> S
  1. The backup action should happen at the same time as the getTFOutputs, after the job is Available.

    1. If the backend type is the official k8s type, we can read the state directly from the backend secret.

    2. Otherwise, we should create a working dir, and prepare the base terraform enveriment by adding the backend.tf and exectuing terraform init, then we can execute terraform state pull to get the tf state from the remote backend. But if the user use the local type, this method will not work.

  2. I don't think the restore action should happen every time the job starts, this action should only happen when the end-user specifie to restore the backup tf state to the remote backend in the configuration yaml file. So, we can use another initContainer to execute the terraform state push command to recover the remote backend.

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 11, 2022

@loheagn Let's open another issue to track the backup and restore discussion.

Generally, backup and restore is a system level capability. Users don't need to set anything in a Configuration.

@loheagn
Copy link
Contributor Author

loheagn commented Apr 11, 2022

OK. So, the concluation is that, we should allow the end-users to specifie the backend configuration when they use git configurations, and the specification may like the following:

apiVersion: terraform.core.oam.dev/v1beta1
kind: Configuration
metadata:
  name: alibaba-rds-mysql-hcl
spec:
  remote: https://github.com/kubevela-contrib/terraform-modules.git
  path: alibaba/rds

  # backend specification example for git type
  backend: 
    type: s3
    config:
      bucket: "mybucket"
      key: "path/to/my/key"
      region: "my-region"

  variable:
    instance_name: "poc"
    account_name: "oamtest"
    password: "Xyfff83jfewGGfaked"
    security_ips:
      - "0.0.0.0/0"
      - "192.168.1.34"

  writeConnectionSecretToRef:
    name: rds-conn
    namespace: default

If the end-users use the inline HCL configuration, we will parse the HCL code manually and detect the backend declaration in the rendering stage.

If we can not detect any backend declarations, we will use the default kubernetes backend type.

If there is no problem with this proposal, I think I can make a PR to implement it first and then we can discuss the details about the backup and restore.

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 11, 2022

The sample of backend looks good to me, which is more native. While Terraform users tends to set the backend as below.

terraform {
  backend "s3" {
    bucket = "mybucket"
    key    = "path/to/my/key"
    region = "us-east-1"
  }
}

So do you think we should also support the inline way to set a backend?

@loheagn
Copy link
Contributor Author

loheagn commented Apr 11, 2022

I think it's ok. We can support both two ways.

Another question, if the user provid the backend declaration both in the inline hcl code and in the configuration yaml directly, what's the priority?

@zzxwill
Copy link
Collaborator

zzxwill commented Apr 11, 2022

I think it's ok. We can support both two ways.

Another question, if the user provid the backend declaration both in the inline hcl code and in the configuration yaml directly, what's the priority?

spec.backend comes first as it's users' intention. And it would be better if we throw the warning in the Configuration status.

@loheagn
Copy link
Contributor Author

loheagn commented Apr 11, 2022

Get it. Thanks for your guidance! @zzxwill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants