Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix terraform state persistence issue in all repos #6038

Closed
20 tasks
ewastempel opened this issue Jan 22, 2024 · 3 comments
Closed
20 tasks

Fix terraform state persistence issue in all repos #6038

ewastempel opened this issue Jan 22, 2024 · 3 comments
Assignees
Labels
bug Something isn't working other-team-dependency This ticket requires work from another team: include in DOR and DOD

Comments

@ewastempel
Copy link
Contributor

ewastempel commented Jan 22, 2024

Expected Behavior

When terraform apply is run within pipeline, the terraform state should be saved with no issue.

Actual Behavior

The terraform state persistence fails with the following error:

Error: Failed to save state

Error saving state: failed to upload state: operation error S3: PutObject,
failed to rewind transport stream for retry, request stream is not seekable

Error: Failed to persist state to backend

The error shown above has prevented Terraform from writing the updated state
to the configured backend. To allow for recovery, the state has been written
to the file "errored.tfstate" in the current working directory.

Running "terraform apply" again at this point will create a forked state,
making it harder to recover.

To retry writing this state, use the following command:
    terraform state push errored.tfstate

See this pipeline failure as an example.

Additional information and the fix that needs to be rolled out

See this issue for details #5859

It appears that the error may be a terraform bug and we see the same issue in v1.6.6:
hashicorp/terraform#34528

Additionally, the CloudTrail does not show any errors for the HTTP request that is being sent by terraform to AWS s3 (PutObject), which is a good indication that the problem lies on terraform (no issue in cloudtrail and the state gets saved, but the terraform still fails).

The state push fix (#6020 and #6022) for the state persistence failure is now rolled out to the scheduled baseline workflow with temporarily suppression of slack alerts for when the state push is successful.

Use the above state push fix (#6020 and #6022) to roll it out in all workflows in all repos (see the full list in the Definition of Done section).

Version

n/a

Modules

n/a

Account

all

Definition of done

Roll the fix out to the below repos/workflows:

  • modernisation-platform/.github/workflows/bootstrap-sprinkler.yml
  • modernisation-platform/.github/workflows/core-logging-deployment.yml
  • modernisation-platform/.github/workflows/core-network-services-deployment.yml
  • modernisation-platform/.github/workflows/core-security-deployment.yml
  • modernisation-platform/.github/workflows/core-shared-services-deployment.yml
  • modernisation-platform/.github/workflows/core-vpc-development-deployment.yml
  • modernisation-platform/.github/workflows/core-vpc-preproduction-deployment.yml
  • modernisation-platform/.github/workflows/core-vpc-production-deployment.yml
  • modernisation-platform/.github/workflows/core-vpc-test-deployment.yml
  • modernisation-platform/.github/workflows/modernisation-platform-account.yml
  • modernisation-platform/.github/workflows/new-environment.yml
  • modernisation-platform/.github/workflows/scheduled-baseline.yml
  • modernisation-platform/.github/workflows/terraform-github.yml
  • modernisation-platform/.github/workflows/terraform-member-environment.yml
  • modernisation-platform/.github/workflows/terraform-pagerduty.yml
  • modernisation-platform-ami-builds/.github/workflows/components.yml
  • modernisation-platform-ami-builds/.github/workflows/example.yml
  • modernisation-platform-ami-builds/.github/workflows/modernisation-platform.yml
  • modernisation-platform-environments/.github/workflows/nuke-redeploy.yml
  • modernisation-platform-environments/.github/workflows/reusable_terraform_plan_apply.yml
@SimonPPledger
Copy link
Contributor

park this and wait for the fix from terraform

@SimonPPledger SimonPPledger added Low Priority On hold - but would be good to do other-team-dependency This ticket requires work from another team: include in DOR and DOD and removed Low Priority On hold - but would be good to do labels Feb 15, 2024
@SimonPPledger
Copy link
Contributor

Waiting for fix from terraform

@dms1981
Copy link
Contributor

dms1981 commented Mar 18, 2024

Terraform 1.7.5 was released on March 13th.
We had a successful baselines run on March 14th using this version.

We can leave the state error handling in place in the interim, but could remove it at a later date if the code isn't serving a purpose.

@dms1981 dms1981 closed this as completed Mar 18, 2024
@dms1981 dms1981 self-assigned this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working other-team-dependency This ticket requires work from another team: include in DOR and DOD
Projects
Status: Done
Development

No branches or pull requests

3 participants