Support client partition data reassign #1608

zuston · 2024-03-29T03:04:09Z

Motivation

After reviewing #1445 again(partition data reassign, which is disabled by default in the master branch), I found some bugs and design problems. I will use this issue to track the further optimizations.

Subtasks tracking

[#1608][part-1] fix(spark): Only share the replacement servers for faulty servers in one stage #1609
[#1608][part-2] fix(spark): avoid releasing block in advance when enable block resend #1610
[#1373][FOLLOWUP] fix(spark): shuffle manager rpc service invalid when partition data reassign is enabled #1583
[#1373][FOLLOWUP] fix(spark): register with incorrect partitionRanges after reassign #1612
[#1608][part-3] feat(spark3): support reading partition data from multiple reassignment servers #1615
[#1608][part-4] feat(server)(spark3): activate partition reassign when server is inactive #1617
[#1608][part-5] feat(spark3): always use the available assignment #1652
[#1608][part-6] improvement(spark): verify the sent blocks count #1690
[#1608][part-7] improvement(doc): add doc and optimize reassign config options #1693

Design thought

reassign rpc between with spark driver + executor

This part has been involved in the #1445 design doc, I will not describe more.

reassign signal propagation

In current codebase, the latest reassign partition-> servers plan won't be propagated into the next start tasks.
To solve this problem, I will make writer always get the latest partition->servers plan. Once the reassign signal happens,
the cached shuffleHandleInfo will be updated by the reassign rpc returned.

For the next start task(task2) after reassign tasks finished, task2 will get the latest plan according to the replacement + normal servers list. It will avoid writing to the faulty servers again.

reassign multiple servers for one partition

This topic is scoped in the single replica.

For the different type partition, we will have different strategies for the partition -> multiple servers assign.
For huge partition, I will hope that after recogizing the huge_partition, we will request reassign multiple servers by rpc, and the task will acheive its owned partitioned server by the hash mechanism by its taskAttemptId,
which will make load balance valid.

For normal partition, the multiple servers are only valid on the reassign multiple times due to the expected problems.
For this case, the task will always get the last server to write.

…ver in one stage

…en enable block resend

rickyma · 2024-03-29T19:10:54Z

I found some bugs and design problems

What kind of bugs did you find? What will the bug cause? Data loss, or ... ? Can you elaborate more in this issue?

zuston · 2024-03-30T00:33:51Z

I found some bugs and design problems

What kind of bugs did you find? What will the bug cause? Data loss, or ... ? Can you elaborate more in this issue?

This is just for the partition data reassignment, which will not effect data correctness and loss.

…en enable block resend

…ulty servers in one stage (#1609) ### What changes were proposed in this pull request? 1. Lock the `shuffleHandle` to ensure the thread safe when reassigning partial server for tasks 2. Only share the replacement servers for faulty servers in one stage rather than the whole app 3. Simplify the reassignment logic, like the single one replacement server which will be supported in the future, so let's remove it currently. 4. correct the `partitionIds` type from `string` to `int` in proto ### Why are the changes needed? Fix: #1608 In current implementation of partition reassignment, it will share the same reassignment servers for the different stages, which will crash for app without registry. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? UTs

…en enable block resend

…ock data reassignment servers

…n server is inactive

…ble block resend (#1610) ### What changes were proposed in this pull request? 1. avoid releasing block previously when enable block resend 2. introduce the block max retry times ### Why are the changes needed? For: #1608 In the current codebase for partition reassignment, it has some bugs as follows 1. data has been released when resending. 2. if the blocks fail to resend, it may fast fail without retry again ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `RssShuffleWriterTest#blockFailureResendTest` is to test the resending block mechanism.

…ock data reassignment servers

…signed servers (#1615) ### What changes were proposed in this pull request? Support reading from partition block data reassignment servers. ### Why are the changes needed? For: #1608 Writer has been writing data into reassignment servers, so it's necessary to read from reassignment servers. And the blockId will be stored in their owned partition servers, so this PR can read blockIds from these servers and support min-replica requirements at the same time. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `PartitionBlockDataReassignTest` integration test.

…and load balance for huge partition

…n server is inactive

) ### What changes were proposed in this pull request? 1. make the write client always use the latest available assignment for the following writing when the block reassign happens. 2. support multi time retry for partition reassign 3. limit the max reassign server num of one partition 4. refactor the reassign rpc 5. rename the faultyServer -> receivingFailureServer. #### Reassign whole process ![image](https://github.com/apache/incubator-uniffle/assets/8609142/8afa5386-be39-4ccb-9c10-95ffb3154939) #### Always using the latest assignment To acheive always using the latest assignment, I introduce the `TaskAttemptAssignment` to get the latest assignment for current task. The creating process of AddBlockEvent also will apply the latest assignment by `TaskAttemptAssignment` And it will be updated by the `reassignOnBlockSendFailure` rpc. That means the original reassign rpc response will be refactored and replaced by the whole latest `shuffleHandleInfo`. ### Why are the changes needed? This PR is the subtask for #1608. Leverging the #1615 / #1610 / #1609, we have implemented the reassign servers mechansim when write client encounters the server failure or unhealthy. But this is not good enough that will not share the faulty server state to the unstarted tasks and latter `AddBlockEvent` . ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Unit and integration tests. Integration tests as follows: 1. `PartitionBlockDataReassignBasicTest` to validate the reassign mechanism valid 2. `PartitionBlockDataReassignMultiTimesTest` is to test the partition reassign mechanism of multiple retries. --------- Co-authored-by: Enrico Minack <github@enrico.minack.dev>

…n server is inactive

… config options

### What changes were proposed in this pull request? Verify the sent blocks count in write tasks for spark ### Why are the changes needed? For #1608. After introducing the reassign menchanism, the blocks' stored location will be dynamiclly changed. To ensure possible or potenial bugs, it's necessary to introduce the block count. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests are enough to ensure safe

…g options (#1693) ### What changes were proposed in this pull request? 1. add docs about reassign mechanism 2. rename the config from "rss.client.blockSendFailureRetry.enabled" to "rss.client.reassign.enabled" ### Why are the changes needed? Fix: #1608 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Needn't

…ies when block access is denied

zuston added a commit to zuston/incubator-uniffle that referenced this issue Mar 29, 2024

[apache#1608] fix(spark): re-assign only once for the same faulty ser…

e704b56

…ver in one stage

zuston mentioned this issue Mar 29, 2024

[#1608][part-1] fix(spark): Only share the replacement servers for faulty servers in one stage #1609

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this issue Mar 29, 2024

[apache#1608][part-2] fix(spark): avoid releasing block previously wh…

990a0e2

…en enable block resend

zuston mentioned this issue Mar 29, 2024

[#1608][part-2] fix(spark): avoid releasing block in advance when enable block resend #1610

Merged

zuston changed the title ~~[Improvement] One partition data could be written to multiple servers~~ [Improvement] Optimize partition data reassignment Mar 30, 2024

zuston changed the title ~~[Improvement] Optimize partition data reassignment~~ [Improvement] Partition data reassignment Mar 30, 2024

zuston changed the title ~~[Improvement] Partition data reassignment~~ [Feature] Partition data reassignment Apr 1, 2024

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 1, 2024

[apache#1608][part-2] fix(spark): avoid releasing block previously wh…

29f63b8

…en enable block resend

zuston closed this as completed in #1609 Apr 2, 2024

zuston reopened this Apr 2, 2024

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 2, 2024

[apache#1608][part-2] fix(spark): avoid releasing block previously wh…

e79a744

…en enable block resend

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 2, 2024

[apache#1608][part-3] feat(spark3): support reading from partition bl…

7dfdb43

…ock data reassignment servers

zuston mentioned this issue Apr 2, 2024

[#1608][part-3] feat(spark3): support reading partition data from multiple reassignment servers #1615

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 3, 2024

[apache#1608][part-4] feat(server)(spark3): best effort to resend whe…

ecf2183

…n server is inactive

zuston mentioned this issue Apr 3, 2024

[#1608][part-4] feat(server)(spark3): activate partition reassign when server is inactive #1617

Open

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 8, 2024

[apache#1608][part-3] feat(spark3): support reading from partition bl…

befa87a

…ock data reassignment servers

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 8, 2024

[apache#1608][part-3] feat(spark3): support reading from partition bl…

f6ca2a9

…ock data reassignment servers

zuston changed the title ~~[Feature] Partition data reassignment~~ Support partition data reassignment Apr 10, 2024

zuston changed the title ~~Support partition data reassignment~~ Support partition data reassign Apr 10, 2024

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 17, 2024

[apache#1608][part-5] feat(spark3): always use the latest assignment …

450a9ab

…and load balance for huge partition

zuston mentioned this issue Apr 17, 2024

[#1608][part-5] feat(spark3): always use the available assignment #1652

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 24, 2024

[apache#1608][part-4] feat(server)(spark3): best effort to resend whe…

1958525

…n server is inactive

zuston added a commit to zuston/incubator-uniffle that referenced this issue May 9, 2024

[apache#1608][part-4] feat(server)(spark3): best effort to resend whe…

ba4b80f

…n server is inactive

zuston changed the title ~~Support partition data reassign~~ Support client partition data reassign May 11, 2024

zuston added a commit to zuston/incubator-uniffle that referenced this issue May 11, 2024

[apache#1608][part-6] improvement(spark): verify the sent blocks count

fd9aec5

zuston mentioned this issue May 11, 2024

[#1608][part-6] improvement(spark): verify the sent blocks count #1690

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this issue May 11, 2024

[apache#1608][part-6] improvement(spark): verify the sent blocks count

25fcea0

zuston added a commit to zuston/incubator-uniffle that referenced this issue May 11, 2024

[apache#1608][part-7] improvement(doc): add doc and optimize reassign…

fbe96f1

… config options

zuston mentioned this issue May 11, 2024

[#1608][part-7] improvement(doc): add doc and optimize reassign config options #1693

Merged

zuston closed this as completed in #1693 May 15, 2024

dingshun3016 pushed a commit to dingshun3016/incubator-uniffle that referenced this issue May 15, 2024

[apache#1608][part-8] feat(spark3): add a limit to the number of retr…

b683d1b

…ies when block access is denied

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support client partition data reassign #1608

Support client partition data reassign #1608

zuston commented Mar 29, 2024 •

edited

rickyma commented Mar 29, 2024

zuston commented Mar 30, 2024

Support client partition data reassign #1608

Support client partition data reassign #1608

Comments

zuston commented Mar 29, 2024 • edited

Motivation

Subtasks tracking

Design thought

reassign rpc between with spark driver + executor

reassign signal propagation

reassign multiple servers for one partition

rickyma commented Mar 29, 2024

zuston commented Mar 30, 2024

zuston commented Mar 29, 2024 •

edited