Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

Open
3 tasks done
zhengchenyu opened this issue Dec 27, 2023 · 1 comment · May be fixed by #1418
Open
3 tasks done

[Improvement] [TEZ][MR] blockid will be overflows when the shuffle data is large enough. #1398

zhengchenyu opened this issue Dec 27, 2023 · 1 comment · May be fixed by #1418
Assignees

Comments

@zhengchenyu
Copy link
Collaborator

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

We found blockid was overflow when shuffle date is large enough, especially when the data is skewed.

For MR and TEZ, block is consist of sequentially increasing block id, task attempt id, partition id, task id.
The highest 12 bits are used for sequentially increasing block id, the next 6 bits are used for task attempt id, the next 24 bits are used for partition id, the lowest 21 bits are used for task id. (Note: Ignore highest bit)
So if block size in one partition is larger than 2^12, will be overflow.

For spark, block size supports maximum of 2^18. Because the lowest 21 bits is from TaskContext::taskAttemptId, this is a sequentially increasing number, can be used to identify tasks and task attempts.

I think we could move the task attempt id to the lowest 21 bits, reduce the number of bits allocated to task attempt id. In general task attempt id is not large.

How should we improve?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@qijiale76
Copy link
Contributor

Please assign this issue to me.

qijiale76 added a commit to qijiale76/incubator-uniffle that referenced this issue Jan 17, 2024
… 18 bit atomicInt to 21 bit taskAttemptId in 63 bit BlockId.
qijiale76 added a commit to qijiale76/incubator-uniffle that referenced this issue Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants