Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix accidental mutation of shared cty.Paths in ValueMarks funcs #32543

Merged
merged 1 commit into from Jan 20, 2023

Conversation

nfagerlund
Copy link
Contributor

@nfagerlund nfagerlund commented Jan 19, 2023

Go's append() reserves the right to mutate (the backing storage of) its primary argument in-place, and expects the caller to assign its return value to the same variable that was passed as the primary argument. Due to what was almost definitely a typo (followed by copy-paste mishap), the configschema Block.ValueMarks and Object.ValueMarks functions were treating it like an immutable function that returns a new slice.

In rare and hard-to-reproduce cases, this was causing bizarre malfunctions when marking sensitive schema attributes in deeply-nested block structures -- omitting the marks for some sensitive values (🚨), and marking other entire blocks as sensitive (which is supposed to be impossible). The chaotic and unreliable nature of the bugs is likely related to append()'s automatic slice reallocation behavior (if the append operation overflows the original array allocation, the resulting behavior can look immutable), but there might be other contributing factors too.

This commit fixes existing instances of the problem, and wraps the desired copy-and-append behavior in a helper function to simplify handling shared parent paths in an immutable way.

Discussion for Reviewers

We originally started investigating this as a structured log bug (@brandonc had a previous PR for that, on which @alisdair pointed out that the "bug" should have been impossible), and ultimately chased it back to legit inaccurate info in the after_sensitive object in the JSON plan.

Reproducing this problem is incredibly dicey! Minimal repro attempts using the tfcoremock and alisdair/nested providers all failed, and I only eventually succeeded by copy-pasting the actual schema from rancher/rancher2 into a dummy provider and playing with the nesting levels.

To test the problem (on main or any released 1.x Terraform) and the fix (on this branch), use:

Once you've got the plan json, jump to the after_sensitive property for the one resource_change:

  • The block corresponding to the etcd block should have both cert and key attrs marked as sensitive.
  • The innermost s3_backup_config block should have both access_key and secret_key attrs marked as sensitive.
  • No complete blocks should be marked as sensitive, since that's impossible.

In trying to pin down the problem, I duplicated the problematic nested schema a few times in my dummy resource, with one instance at the original nesting level, one instance nested another level down, and instances out-dented by one and two levels. As you can see in this side-by-side of excerpted values, the bug behaves differently for all of them.

  • The original nesting level (second from left) gets it the worst, as a whole block gets eaten and the etcd attrs are gone.
  • The additional nesting level (left) keeps the etcd attrs but loses ONE of the innermost s3_backup_config attrs.
  • Outdented by one level (third from left) is fine.
  • Outdented by two levels (right) loses THE OTHER of the innermost attrs.

To me, that smells like something profoundly order-dependent and haunted.

And that, dear reader, is why I don't have tests on this PR -- nailing down this chaotic behavior enough to get a minimal, reliable repro using only an in-memory schema was going to take at least another two days, and since the problem originates with some blatant and inarguable typos, I couldn't quite justify the extra expense to my team.

(...But if reviewers insist on giving me an excuse to, obviously I'm chomping at the bit to blow another two days nailing a stake through this thing's heart, y'all know what I'm about. Just, trying to be pragmatic here.)

Target Release

1.4.x

Draft CHANGELOG entry

BUG FIXES

  • Fixed a rare bug causing inaccurate before_sensitive / after_sensitive annotations in JSON plan output for deeply nested structures. This was only observed in the wild on the rancher/rancher2 provider, and resulted in glitched display in Terraform Cloud's structured plan log view.

Go's `append()` reserves the right to mutate its primary argument in-place, and
expects the caller to assign its return value to the same variable that was
passed as the primary argument. Due to what was almost definitely a typo
(followed by copy-paste mishap), the configschema `Block.ValueMarks` and
`Object.ValueMarks` functions were treating it like an immutable function that
returns a new slice.

In rare and hard-to-reproduce cases, this was causing bizarre malfunctions when
marking sensitive schema attributes in deeply-nested block structures --
omitting the marks for some sensitive values (🚨), and marking other entire
blocks as sensitive (which is supposed to be impossible). The chaotic and
unreliable nature of the bugs is likely related to `append()`'s automatic slice
reallocation behavior (if the append operation overflows the original array
allocation, the resulting behavior can _look_ immutable), but there might be
other contributing factors too.

This commit fixes existing instances of the problem, and wraps the desired
copy-and-append behavior in a helper function to simplify handling shared parent
paths in an immutable way.
Copy link
Contributor

@brandonc brandonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sick!

@nfagerlund
Copy link
Contributor Author

nfagerlund commented Jan 20, 2023

I think I now have a slightly more nuanced (but still incomplete) theory of how the bad behavior occurs.

If you have a path (step slice) with a bit of extra headroom left in its underlying array, then any of its direct children who append to it will successively clobber the next slot out past the parent's length. So you'll end up with multiple PathValueMarks, one for each sibling child, but every one of their paths will be set to whatever the LAST one to be processed was. And then when they get marshalled into their final structure, they get de-duplicated, leaving only one.

That de-duplication probably favors the first one, even though the last one was the one that contributed the path.

So:

  • The requirement for headroom in the backing array is how you get the weirdly alternating misbehaviors at different nesting levels! The extra nesting level in my repro probably filled in the remaining headroom and forced reallocations for its children's appends.
  • The "dedupe to first, which happens to be using the path from last" resolution is how you get a whole block marked as a sensitive scalar value. We must have processed either etcd.cert and etcd.key first and second (whichever order, doesn't matter), and then traversed into etcd.backup_config to find its nested sensitive values, clobbering the path of the first two sensitive marks. (I think we still would have had PathValueMarks for those nested values? But we must have thrown them away as invalid during marshalling or something, not wholly sure how that worked.)
  • The ordering that determines the exact result is still a complete mystery, possibly influenced by phase of the moon.

Copy link
Member

@alisdair alisdair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

Copy link
Contributor

@Uk1288 Uk1288 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@nfagerlund nfagerlund merged commit 3b26f68 into main Jan 20, 2023
@github-actions
Copy link

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@nfagerlund nfagerlund deleted the nf/jan23-attr-path-value-marks-corruption branch January 20, 2023 18:18
@nfagerlund nfagerlund added the 0.13-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged label Jan 20, 2023
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0.13-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants