Bugfix/corrects replication in clusters of different sizes #5567

davidporter-id-au · 2023-12-29T12:23:36Z

What changed?

This tests/fixes replication between clusters of different sizes. This is not super well tested, so please treat as an alpha feature.

Intention and where this works

Smaller # of shards -> larger cluster: Should be ok:

This is a bugfix for replication between clusters where the shard numbers aren't the same. The high-level approach is to only partially support this replication with a very simple bugfix for now. This adds the approach of simple shard filtering where requests for shards higher than what are supported are simply dropped.

The intention is to fix this for the use-case where a cluster has a small number of shards and there is an intention to connect it to a larger cluster for expansion. In this use-case the traffic for domains active on the old (smaller) cluster should be able to be replicated to the new (larger) cluster as usual. I've done some preliminary testing to confirm this is the case.

During this replication workflows moving from the smaller cluster to the larger will move history shards in the database as they are replicated. Ie, if a workflow in small-cluster hashed to shard 123, it would hash to some other hash in the new cluster. Though this shouldn't affect anything.

Caveats

Don't use this to replicate from larger clusters to smaller:

However, in the scenario that a domain is active in the new (larger) cluster and it's replicating back to the old cluster workflows will be partially dropped. The drop is partial simply because it's more convenient to drop too large replication requests rather than dropping all since cluster metadata to the receiving cluster isn't available; it's not possible to know what the requesting cluster's shard size is without some more significant API changes.

Why not use some shard translation scheme like /2

When I first was poking around with this problem I was intending to just see if I could connect a cluster power-two size differences and half / double the shards in translation. This isn't possible however, without a stateful intermediary, because of the replication API: The replication API keepts track of offsets on the producer side, meaning that any reduction of 2 shards into a single one would need to solve the problem of keeping track of both offsets somehow.

Why?

How did you test it?
Tested locally with a multicluster setup

Potential risks
This change could break replication and should be treated with care.

Release notes

Documentation Changes

This change probably requires updating the migration documentation for replication-based migration.

sizes

service/frontend/adminHandler.go

taylanisikdemir · 2023-12-29T17:30:15Z

service/frontend/adminHandler.go

+
+// Ensures that shards received are within the range of the server,
+// or they'll be filtered out. This necessarily means that replication requests
+// that are for shards that don't exist are dropped.


if the replication is fundamentally can't be done completely in such setups, should we even attempt to do the partial replication?

it can be - as far as I can tell, successfully be done when scaling a cluster up, in terms of the number of shards. Just not down. This strikes me as useful.

I don't think there's any value in doing partial replication, it's just hard to prevent it in such scenarios. The poller information doesn't presently convey the shard count, and such an API change seems like overkill.

taylanisikdemir · 2024-01-02T18:49:31Z

service/frontend/adminHandler.go

+	// higher than what it can handle, it'll drop them.
+	filteredRequest := filterReplicationShards(adh.config.NumHistoryShards, request)
+	if len(filteredRequest.Tokens) != len(request.Tokens) {
+		adh.GetLogger().Warn("Warning! Received replication request from a cluster with a greater number of shards."+


including the number of shards in current and requesting cluster to the log would be useful

There is no obvious way (that I'm aware of) to get the requesting cluster's shard count, else I would

taylanisikdemir · 2024-01-02T18:53:12Z

service/frontend/adminHandler_test.go

+			goMock := gomock.NewController(t)
+			mockhistoryclient := history.NewMockClient(goMock)
+
+			mockhistoryclient.EXPECT().GetReplicationMessages(gomock.Any(), td.expectedOut).Return(&types.GetReplicationMessagesResponse{}, nil)


does this mock line validates GetReplicationMessages is called with tc.expectedOut or does it need Times(1)?

I think no explicit Times is equivalent to MinTimes(1)... but that's probably worth verifying.

CalebAlbers · 2024-01-05T20:46:52Z

@davidporter-id-au thank you so much for putting this together. I was hoping to get some insight into testing strategy for this. Do you think it is likely that this change will get merged into the next release?

I'm just getting a feel for if we should plan on cherry picking this PR and making an internal build for our uses vs grabbing an upcoming release instead.

I really do appreciate you and the team doing this. Let me know if there's anything I can do to help!

CalebAlbers · 2024-01-11T22:07:10Z

service/frontend/adminHandler.go

+	// higher than what it can handle, it'll drop them.
+	filteredRequest := filterReplicationShards(adh.config.NumHistoryShards, request)
+	if len(filteredRequest.Tokens) != len(request.Tokens) {
+		adh.GetLogger().Warn("Warning! Received replication request from a cluster with a greater number of shards."+


This is definitely a useful log message to have, but unfortunately, it also means that there is a high degree of frequency with which it gets logged. For example, if you are going from 1k shards to 8k (e.g., MySQL to Cassandra), you will get 7k of these logged on every replication cycle.

Could we change it to either be debug or only log once per cluster/etc? I've attached a screenshot of what the logs look like for a very small reproduction case (1 vs 4 shards) I put together in Docker Compose using this PR's patch on top of v1.2.6

Sorry for not being on the ball with this one.

That's way too much noise, agreed. when you're running that however, is the active side on the larger cluster? that should be the scenario that's broken and will not work.

Thanks for digging into it! In this case, the active side is the lower shard count one (with an unfortunate name due to historical reasons). We actually didn't have any domains set up to replicate - these logs were happening out of the box.

I'm happy to share my code, if that would help with testing. It's a fairly concise docker-compose with 3 clusters (one with fewer shards) set up to replicate

yeah if you could post the config that'd be helpful for me to understand what's going on. I'm blocked from testing it more internally due to some other PR we need to fix, but otherwise I'll be able to take more of a look hopefully later next week

taylanisikdemir · 2024-03-25T19:08:49Z

service/frontend/adminHandler_test.go

@@ -1038,3 +1038,71 @@ func Test_UpdateDomainIsolationGroups(t *testing.T) {
 		})
 	}
 }
+


Leaving this comment here as a reminder to update documentation https://github.com/uber/Cadence-Docs/pull/181/files

oulman · 2024-03-29T21:23:12Z

👋 Hi folks. Re: the docs update in uber/Cadence-Docs#181 I was wondering if there is still a plan to merge this PR and support migrating from a smaller -> larger number of shards? Thanks!

davidporter-id-au added 3 commits December 29, 2023 11:44

Possibly have a fix for shard replication for clusters with different

e53e3b1

sizes

Adds a test

5294043

Merge branch 'master' into bugfix/corrects-inter-shard-replication

ed8adae

davidporter-id-au changed the title ~~Bugfix/corrects inter shard replication~~ Bugfix/corrects replication in clusters of different sizes Dec 29, 2023

davidporter-id-au marked this pull request as ready for review December 29, 2023 13:04

davidporter-id-au requested review from Shaddoll, neil-xie, Groxx, shijiesheng, agautam478, jakobht, 3vilhamster, sankari165, dkrotx, taylanisikdemir and demirkayaender as code owners December 29, 2023 13:04

taylanisikdemir reviewed Dec 29, 2023

View reviewed changes

Merge branch 'master' into bugfix/corrects-inter-shard-replication

be3ae3c

taylanisikdemir approved these changes Jan 2, 2024

View reviewed changes

Merge branch 'master' into bugfix/corrects-inter-shard-replication

21493c3

CalebAlbers reviewed Jan 11, 2024

View reviewed changes

taylanisikdemir reviewed Mar 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/corrects replication in clusters of different sizes #5567

Bugfix/corrects replication in clusters of different sizes #5567

davidporter-id-au commented Dec 29, 2023 •

edited

taylanisikdemir Dec 29, 2023

davidporter-id-au Dec 30, 2023 •

edited

taylanisikdemir Jan 2, 2024

davidporter-id-au Jan 3, 2024

taylanisikdemir Jan 2, 2024

Groxx Jan 5, 2024

CalebAlbers commented Jan 5, 2024

CalebAlbers Jan 11, 2024

davidporter-id-au Jan 13, 2024

CalebAlbers Jan 17, 2024

davidporter-id-au Jan 19, 2024 •

edited

taylanisikdemir Mar 25, 2024

oulman commented Mar 29, 2024

@@ @@ -1038,3 +1038,71 @@ func Test_UpdateDomainIsolationGroups(t *testing.T) { @@
               		})
               	}
               }

Bugfix/corrects replication in clusters of different sizes #5567

Are you sure you want to change the base?

Bugfix/corrects replication in clusters of different sizes #5567

Conversation

davidporter-id-au commented Dec 29, 2023 • edited

Intention and where this works

Caveats

Why not use some shard translation scheme like /2

Choose a reason for hiding this comment

davidporter-id-au Dec 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CalebAlbers commented Jan 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidporter-id-au Jan 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oulman commented Mar 29, 2024

davidporter-id-au commented Dec 29, 2023 •

edited

davidporter-id-au Dec 30, 2023 •

edited

davidporter-id-au Jan 19, 2024 •

edited