[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

zhengchenyu · 2023-12-27T08:09:20Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

We found blockid was overflow when shuffle date is large enough, especially when the data is skewed.

For MR and TEZ, block is consist of sequentially increasing block id, task attempt id, partition id, task id.
The highest 12 bits are used for sequentially increasing block id, the next 6 bits are used for task attempt id, the next 24 bits are used for partition id, the lowest 21 bits are used for task id. (Note: Ignore highest bit)
So if block size in one partition is larger than 2^12, will be overflow.

For spark, block size supports maximum of 2^18. Because the lowest 21 bits is from TaskContext::taskAttemptId, this is a sequentially increasing number, can be used to identify tasks and task attempts.

I think we could move the task attempt id to the lowest 21 bits, reduce the number of bits allocated to task attempt id. In general task attempt id is not large.

How should we improve?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

qijiale76 · 2023-12-27T08:12:40Z

Please assign this issue to me.

… 18 bit atomicInt to 21 bit taskAttemptId in 63 bit BlockId.

zhengchenyu assigned zhengchenyu and unassigned zhengchenyu Dec 27, 2023

zhengchenyu assigned qijiale76 Dec 27, 2023

zhengchenyu mentioned this issue Dec 27, 2023

[Improvement] support sequential unique block id #1399

Open

3 tasks

qijiale76 linked a pull request Jan 4, 2024 that will close this issue

[#1398] fix(mr,tez): Make attempId computable and move it to taskAttemptId in BlockId layout. #1418

Open

qijiale76 added a commit to qijiale76/incubator-uniffle that referenced this issue Jan 17, 2024

[apache#1398] fix(MR)(TEZ): Limit attemptId to 4 bit and move it from…

071c46e

… 18 bit atomicInt to 21 bit taskAttemptId in 63 bit BlockId.

qijiale76 added a commit to qijiale76/incubator-uniffle that referenced this issue Jan 17, 2024

[apache#1398] [FOLLOW UP]

40cfd3e

EnricoMi mentioned this issue Feb 8, 2024

[Improvement] Replace taskAttemptId in blockId with mapIndex #1512

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

zhengchenyu commented Dec 27, 2023

qijiale76 commented Dec 27, 2023

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

Comments

zhengchenyu commented Dec 27, 2023

Code of Conduct

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

qijiale76 commented Dec 27, 2023