Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manage cockroachdb cluster version with blueprints #5603

Merged
merged 28 commits into from May 25, 2024

Conversation

iliana
Copy link
Contributor

@iliana iliana commented Apr 23, 2024

To upgrade CockroachDB, we'll need to manage the cluster.preserve_downgrade_option cluster setting to give ourselves the opportunity to roll back an upgrade. I was initially planning to manage this with database migrations, but SET CLUSTER SETTING cannot be run as part of a multi-statement transaction.

In the limit, the Reconfigurator will need to do this anyway as it performs rolling upgrades of CockroachDB nodes, so we may as well teach it to manage cluster settings today.

@iliana iliana added this to the 9 milestone Apr 23, 2024
@iliana
Copy link
Contributor Author

iliana commented Apr 23, 2024

Labeling for Release 9 since this is relatively high-risk and not worth slipping the schedule I think. This will slip CockroachDB 22.2 to R10.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I have some nits here and some larger things but this definitely feels like the right direction!

nexus/types/src/deployment/planning_input.rs Outdated Show resolved Hide resolved
nexus/types/src/deployment/planning_input.rs Outdated Show resolved Hide resolved
nexus/types/src/deployment/planning_input.rs Outdated Show resolved Hide resolved
nexus/types/src/deployment/planning_input.rs Show resolved Hide resolved
nexus/types/src/deployment/planning_input.rs Outdated Show resolved Hide resolved
nexus/reconfigurator/planning/src/planner.rs Outdated Show resolved Hide resolved
nexus/reconfigurator/planning/src/planner.rs Outdated Show resolved Hide resolved
nexus/reconfigurator/planning/src/planner.rs Outdated Show resolved Hide resolved
nexus/reconfigurator/planning/src/planner.rs Outdated Show resolved Hide resolved
nexus/reconfigurator/execution/src/cockroachdb.rs Outdated Show resolved Hide resolved
@iliana
Copy link
Contributor Author

iliana commented Apr 23, 2024

Do you have thoughts on whether it makes sense to keep the Blueprint implementation as-is (we add a column specifically for the value we want to set the cluster.preserve_downgrade_option setting to), or to instead have an associated database of cluster settings and values we want to ensure are set to a particular value?

The latter makes more sense to me, although might be overkill. But I know when we are able to move beyond v23.2, which provides a different mechanism for performing upgrades, we'll instead target the version setting directly instead of ensuring a particular value for cluster.preserve_downgrade_option. (That design also would more clearly indicates whether a blueprint specifies whether to ensure a particular setting is a particular value, or not.)

@davepacheco
Copy link
Collaborator

Do you have thoughts on whether it makes sense to keep the Blueprint implementation as-is (we add a column specifically for the value we want to set the cluster.preserve_downgrade_option setting to), or to instead have an associated database of cluster settings and values we want to ensure are set to a particular value?

The latter makes more sense to me, although might be overkill. But I know when we are able to move beyond v23.2, which provides a different mechanism for performing upgrades, we'll instead target the version setting directly instead of ensuring a particular value for cluster.preserve_downgrade_option. (That design also would more clearly indicates whether a blueprint specifies whether to ensure a particular setting is a particular value, or not.)

Hmm. When you first mentioned it, I thought the separate table made sense. But if we do that and have a separate table of setting_name TEXT, setting_value TEXT, we have to make sure it's unique (easy), we have to deal with it missing values, we can't easily use types on the values, etc. If we only ever expect to set values for a known set of parameters, then using columns for them (even if there are many of them) may make sense. That ensures that we always have a value for each setting, we can use stronger types for them, etc.

@iliana
Copy link
Contributor Author

iliana commented Apr 23, 2024

That makes sense. And, yeah, I don't really expect setting cluster options to be something we regularly do outside of performing upgrades.

I think I will change the column definition to more clearly reflect that it is a 1:1 mapping to that particular setting, and that NULL means "the blueprint does not touch this setting". (The state of "the blueprint should reset this setting to the default" will be an empty string, which is what the preserve_downgrade_option setting is by default.)

@iliana
Copy link
Contributor Author

iliana commented Apr 24, 2024

Hmm. Something else we need to think about: if a rack is initialized on or after a "tick" release and before its corresponding "tock" release, the cluster is created with the major version after the one we specify by policy. The policy will never be effective in that state. We also can't downgrade that particular rack to an older version of CockroachDB in a future Oxide software release.

Idly wondering how ridiculous it would be to have RSS spin up a CockroachDB cluster with an older zone, set cluster.preserve_downgrade_option, then shut it down and bring it up again with the new zone software, before bringing up Nexus. I'm not sure I see a better way at the moment.

(edit) I suppose that's a problem for the release when we actually want to upgrade the CockroachDB software though. This particular change has to land in a release before then :)

@iliana
Copy link
Contributor Author

iliana commented Apr 24, 2024

I figured out a way to, in a single statement, verify some basic cluster state against a caller-provided token and prevent modifying settings if the state doesn't match. It uses a combination of two versions, the current cluster version and the major version of the CockroachDB node we're currently talking to. (Since that will give us some pain when we start doing rolling upgrades of CockroachDB nodes, I should move that part of execution from the beginning to the end.)

Other things I still need to do:

  • Write a Big Theory Statement, somewhere. There's more documentation in the individual parts that are most load-bearing now but it'll be nice to have it all in one place, along with what someone needs to do step-by-step to cause the next release to be a "tick" or a "tock".
  • Write a dang test.
  • See if I can figure out a way to convince CockroachDB to start a cluster on an older cluster version, for being able to test the "tick" side of this? Unsure if that's feasible.
  • Double-check usage of "cluster settings" vs "CockroachDB settings".

@iliana
Copy link
Contributor Author

iliana commented May 1, 2024

I've written up a file in docs that should be a self-contained high-level explanation of what we're doing, but I'd appreciate a read to verify that (and also verify that some of the larger docstrings in this PR don't conflict with it!).

Still need to write tests of the blueprint planning and execution bits.

@iliana iliana marked this pull request as ready for review May 8, 2024 23:19
@iliana
Copy link
Contributor Author

iliana commented May 8, 2024

Marking ready for review; the only thing that's left is to update the display/diff output of blueprints with this data, but @andrewjstone tells me he's in the middle of refactoring all that, so I think it's maybe fine to do it in a follow-up PR.

///
/// This fingerprint should return a STRING value. It is safe to modify how this
/// fingerprint is calculated between Nexus releases; the stale fingerprint in
/// the previous blueprint will be rejected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent comment

schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
policy
} else {
// Ensure `cluster.preserve_downgrade_option` is reset
""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to this is "no" but wanted to ask just in case - should we check here for "policy is older than version"? If I understand correctly, that should be impossible (that means we think we want to run a CRDB major version that is behind the current cluster version), so I'm guessing it's not worth checking for, since such a situation would blow up long before we got here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the conversations in #oxide-update it sounds like we want to use the policy mechanism to hold back cluster versions as we upgrade CockroachDB, but to use the newer cluster version for any newly-initialized racks. In that case, "policy is older than version" is something that will indeed happen. As part of converting the value of cockroachdb_settings_preserve_downgrade to an enum, I was able to make this distinction a lot more clear and handle this case.

nexus/types/src/deployment.rs Outdated Show resolved Hide resolved
@iliana
Copy link
Contributor Author

iliana commented May 22, 2024

Good news: this all works!

root@oxz_switch1:~# omdb nexus blueprints regenerate -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
generated new blueprint f407d0ac-e59d-4748-97ae-727d2f8c6002

---

root@[fd00:1122:3344:102::3]:32221/omicron> select * from blueprint;
                   id                  |         parent_blueprint_id          |         time_created          |               creator                |              comment              | internal_dns_version | external_dns_version |         cockroachdb_fingerprint          | cockroachdb_setting_preserve_downgrade
---------------------------------------+--------------------------------------+-------------------------------+--------------------------------------+-----------------------------------+----------------------+----------------------+------------------------------------------+-----------------------------------------
  9e072531-2cd6-4358-8666-ee76b7bc7be0 | NULL                                 | 2024-05-22 01:15:37.081336+00 | RSS                                  | initial blueprint from rack setup |                    1 |                    2 |                                          | NULL
  f407d0ac-e59d-4748-97ae-727d2f8c6002 | 9e072531-2cd6-4358-8666-ee76b7bc7be0 | 2024-05-22 03:16:42.372404+00 | 07538e1b-79f3-4faf-a168-db073746d346 |                                   |                    1 |                    2 | d4d87aa2ad877a4cc2fddd0573952362739110de | 22.1
(2 rows)


Time: 2ms total (execution 2ms / network 0ms)

root@[fd00:1122:3344:102::3]:32221/omicron> show cluster setting cluster.preserve_downgrade_option;
  cluster.preserve_downgrade_option
-------------------------------------

(1 row)


Time: 1ms total (execution 0ms / network 0ms)

---

root@oxz_switch1:~# omdb -w nexus blueprints target set f407d0ac-e59d-4748-97ae-727d2f8c6002 inherit
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
set target blueprint to f407d0ac-e59d-4748-97ae-727d2f8c6002
root@oxz_switch1:~# omdb -w nexus blueprints target enable f407d0ac-e59d-4748-97ae-727d2f8c6002
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
set target blueprint f407d0ac-e59d-4748-97ae-727d2f8c6002 to enabled

---

root@[fd00:1122:3344:102::3]:32221/omicron> show cluster setting cluster.preserve_downgrade_option;
  cluster.preserve_downgrade_option
-------------------------------------
  22.1
(1 row)


Time: 0ms total (execution 0ms / network 0ms)

Question I have now that I've tested deploying this: since we currently do not enable execution of blueprints, should RSS do anything special when it creates the initial blueprint, and/or should RSS separately try to set the cluster.preserve_downgrade_option setting to the policy? (I assume not, and that for one-off testing installations like this one I deployed to madrid, this is ultimately fine.)

@jgallagher
Copy link
Contributor

Good news: this all works!

Nice! 🎉

Question I have now that I've tested deploying this: since we currently do not enable execution of blueprints, should RSS do anything special when it creates the initial blueprint, and/or should RSS separately try to set the cluster.preserve_downgrade_option setting to the policy? (I assume not, and that for one-off testing installations like this one I deployed to madrid, this is ultimately fine.)

I agree it doesn't matter for systems like madrid, but if we were setting up a new production rack today, would we want cluster.preserve_downgrade_option to be set to the policy? (I'm assuming we'll set up more production racks before we transition to a world where RSS's initial blueprint is enabled or Nexus does automatic blueprint regeneration / target updates.) My gut feeling is "yes", I think? Would it make sense for Nexus to do that in the handoff it receives from RSS? If so, we could further decide whether RSS should set that value in the blueprint, or whether Nexus should modify the blueprint to match the policy - it does something similar for the changes it makes to external DNS.

@iliana
Copy link
Contributor Author

iliana commented May 22, 2024

That indeed seems like the right place to do it. I'll add that.

@iliana iliana force-pushed the iliana/crdb-cluster-version-v2 branch from 7e1486d to aa252b2 Compare May 23, 2024 01:50
@iliana
Copy link
Contributor Author

iliana commented May 24, 2024

Okay, I think this has approached a state where it's ready for final review.

/// Whether to set `cluster.preserve_downgrade_option` and what to set it to
/// (`None` if this value was retrieved from the database and was invalid)
pub cockroachdb_setting_preserve_downgrade:
Option<CockroachDbPreserveDowngrade>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, feel free to ignore - do you think this would be clearer as a Result<CockroachDbPreserveDowngrade, _> where the error case indicates "something is wrong in the DB" and probably holds the actual string we couldn't deserialize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so a Result<CockroachDbPreserveDowngrade, String>, and then handle that in the blueprint diff spots? idk if that would be clearer but it might be nicer regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or maybe a newtype around a string (for clarity of what that string is). I don't feel strongly about this either way.

AllowUpgrade,
/// Ensure the setting is set to a given cluster version.
Set(CockroachDbClusterVersion),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this; I think it's a nice improvement.

@iliana
Copy link
Contributor Author

iliana commented May 25, 2024

Sigh, BlueprintMetadata implements JsonSchema as part of the nexus-internal API, and the default implementation of JsonSchema for Result<T, E> doesn't pass openapi-lint. I will leave it as an Option at least for now.

@iliana iliana force-pushed the iliana/crdb-cluster-version-v2 branch from 4a73ca2 to 0871b51 Compare May 25, 2024 00:10
@iliana iliana enabled auto-merge (squash) May 25, 2024 00:11
@iliana iliana merged commit ec69e00 into main May 25, 2024
20 checks passed
@iliana iliana deleted the iliana/crdb-cluster-version-v2 branch May 25, 2024 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants