[controller][admin-tool] Streamline controller configs and harden update-store workflow #946
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Streamline controller configs and harden update-store workflow
This PR unifies the
UpdateStore
logic between child controller and parent controllers. We've faced many issues due to the structure of the code (especiallyVeniceParentHelixAdmin
) where a set of store-update operations puts it into a non-healthy state. These are all the changes in this PR:UpdateStoreUtils
that is called by both parent and child controllers to apply the store update and perform the necessary validations.NON_AGGREGATE
data replication policyACTIVE_ACTIVE
data replication policy is only supported when Active-Active replication is enabledACTIVE_ACTIVE
,AGGREGATE
orNONE
ordinal
fromBackupStrategy
enum was being used to write to the admin channel. This is problematic when the enums evolve. Added agetValue
function and used that instead.multi.region
config to control certain features instead of controlling them via explicit configs.LiveConfig
for store migration purposes.enable.native.replication.for.batch.only
enable.native.replication.for.hybrid
enable.native.replication.as.default.for.batch.only
enable.native.replication.as.default.for.hybrid
enable.active.active.replication.as.default.for.batch.only.store
admin.topic.remote.consumption.enabled
enable.incremental.push.for.hybrid.active.active.user.stores
active.active.enabled.on.controller
controller.enable.batch.push.from.admin.in.child
controller.external.superset.schema.generation.enabled
has been added to replacecontroller.parent.external.superset.schema.generation.enabled
. since external superset schema generation must be allowed in single-region mode also.controller.parent.external.superset.schema.generation.enabled
has been marked deprecated, but it has not been completely removed yet.VeniceTwoLayerMultiRegionMultiClusterWrapper
TestFabricBuildout
does this and tests for admin execution id. This is incorrect as that is not how a new region will get added. We need the test setup to support adding blank regions so that we can simulate the true fabric buildout process. Because of this, I've disabled these tests for now.Some side-effects of this change are:
SupersetSchemaGeneratorWithCustomProp
had a bug where if the first schema has a custom prop, the future superset schema generation would fail as Avro doesn't allow overriding custom props. This got caught as the update store logic now also tries creating superset schema if a store enabled RC or WC, or if it previously had a superset schema.There are a few other changes that we should do, but are not done in this PR:
All major operations should only be allowed on the parent controller - Create a store, delete a store, add schemas. We should exclude some system stores from this check like we have for the check allowing batch push to admin in child
Recommendation for Reviewers
I recommend going through the changes in at least two passes. In the first pass, look through all the stuff that has been purely deleted. (Skip
ParentControllerConfigUpdateUtils
as that has been renamed toPrimaryControllerConfigUpdateUtils
and it's contents partially moved toUpdateStoreUtils
. So, GitHub doesn't detect is as a renaming). While doing this pass, you can also glance through various small changes that do not need much pondering over.In the second pass, follow this review order:
VeniceControllerClusterConfig
,VeniceControllerConfig
,VeniceControllerMultiClusterConfig
VeniceControllerService
Admin
UpdateStoreUtils
PrimaryControllerConfigUpdateUtils
6.
AdminUtils
UpdateStoreWrapper
VeniceHelixAdmin
VeniceParentHelixAdmin
clients/venice-pulsar/readme.md
docker/venice-client/create-store.sh
How was this PR tested?
GH CI
Does this PR introduce any user-facing changes?