fix:read inconsistent when local file read handler init before flushing event to disk #1636

dingshun3016 · 2024-04-10T08:26:34Z

What changes were proposed in this pull request?

re init local file read handler when index file name is empty

Why are the changes needed?

some case will cause app to fail because of blocks read inconsistent

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(Please test your changes, and provide instructions on how to test it:

If you add a feature or fix a bug, add a test to cover your changes.
If you fix a flaky test, repeat it for many times to prove it works.)

…ng event to disk

zuston · 2024-04-10T08:55:02Z

I don't see the root cause of this bug. @dingshun3016 Could you help add test case to simulate this or add more description

github-actions · 2024-04-10T08:59:48Z

Test Results

2 363 files ±0 2 363 suites ±0 4h 30m 9s ⏱️ - 1m 2s
912 tests ±0 911 ✅ ±0 1 💤 ±0 0 ❌ ±0
10 585 runs ±0 10 571 ✅ ±0 14 💤 ±0 0 ❌ ±0

Results for commit a7d8604. ± Comparison against base commit ed39955.

dingshun3016 · 2024-04-10T10:23:58Z

I don't see the root cause of this bug. @dingshun3016 Could you help add test case to simulate this or add more description

I post one case log in our production env.
First
Init LocalFileServerReadHandler, but no index files and data files in application_1711720273776_2156538_-779157491-37794/0/1176-1176

[INFO] 2024-04-02 14:23:13,880 Grpc-116 LocalFileServerReadHandler prepareFilePath - index files not find, baseFolder is /uniffle_data/application_1711720273776_2156538_-779157491-37794/0/1176-1176, appId application_1711720273776_2156538_-779157491-37794 shuffleId 0 partitionId 1176 partitionNumPerRange 1 partitionNum 2048 storageBasePath /uniffle_data

[INFO] 2024-04-02 14:23:13,880 Grpc-116 ShuffleServerGrpcService getLocalShuffleIndex - Successfully getShuffleIndex cost 0 ms for 0 bytes with appId[application_1711720273776_2156538_-779157491-37794], shuffleId[0], partitionId[1176]

Second
Shuffer server flush event from memory to local file

[DEBUG] 2024-04-02 14:25:51,696 LocalFileFlushEventThreadPool-42 ShuffleFlushManager processEvent - Flush to file success in 22 ms and release 6765262 bytes event ShuffleDataFlushEvent: eventId=31941657, appId=application_1711720273776_2156538_-779157491-37794, shuffleId=0, startPartition=1176, endPartition=1176, retryTimes=0, underStorage=LocalStorage, isPended=false

Third
Because LocalFileServerReadHandler has been initialized in the first step and is same, so index file and data file is still empty, but in fact there are not empty in this directory at this time.

[INFO] 2024-04-02 14:26:24,636 Grpc-13 ShuffleServerGrpcService getLocalShuffleIndex - Successfully getShuffleIndex cost 0 ms for 0 bytes with appId[application_1711720273776_2156538_-779157491-37794], shuffleId[0], partitionId[1176]

App log

24/04/02 14:26:27 ERROR [Executor task launch worker for task 2507] Executor: Exception in task 1176.3 in stage 5.0 (TID 2507) org.apache.uniffle.common.exception.RssException: Blocks read inconsistent: expected 506 blocks, actual 459 blocks at org.apache.uniffle.common.util.RssUtils.checkProcessedBlockIds(RssUtils.java:375) at org.apache.uniffle.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:279) at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:131) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:297) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.findNextInnerJoinRows$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:748) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)

rickyma · 2024-04-10T11:54:37Z

Could you check the value of the metric total_dropped_event_num in every shuffle server? I encountered "Blocks read inconsistent" exception in #1620 when there are events being dropped.

Also I did not find the log you mentioned above "index files not find" in the latest code. Your code must be different.

dingshun3016 · 2024-04-10T12:41:41Z

Could you check the value of the metric total_dropped_event_num in every shuffle server? I encountered "Blocks read inconsistent" exception in #1620 when there are events being dropped.

Also I did not find the log you mentioned above "index files not find" in the latest code. Your code must be different.

"index files not find" was added when I was troubleshooting the problem.It seem not same problem as #1620, after I added these codes, this case no longer occurred.

rickyma · 2024-04-10T13:11:54Z

Could you check the value of the metric total_dropped_event_num in every shuffle server? I encountered "Blocks read inconsistent" exception in #1620 when there are events being dropped.
Also I did not find the log you mentioned above "index files not find" in the latest code. Your code must be different.

"index files not find" was added when I was troubleshooting the problem.It seem not same problem as #1620, after I added these codes, this case no longer occurred.

What about the metric total_dropped_event_num. Can you check it again when this issue occurred.

dingshun3016 · 2024-04-11T02:54:57Z

Could you check the value of the metric total_dropped_event_num in every shuffle server? I encountered "Blocks read inconsistent" exception in #1620 when there are events being dropped.
Also I did not find the log you mentioned above "index files not find" in the latest code. Your code must be different.

"index files not find" was added when I was troubleshooting the problem.It seem not same problem as #1620, after I added these codes, this case no longer occurred.

What about the metric total_dropped_event_num. Can you check it again when this issue occurred.

total_dropped_event_num at that time was 0.
BTW, our prod environment is based on version 0.8, so these two questions may differ

rickyma · 2024-04-11T03:26:27Z

Can you provide a UT to reproduce this issue? That will be clear for us to figure out the root cause.

zuston · 2024-04-11T03:32:48Z

Thanks for adding detailed logs, I have understood your point, the root cause is to partition shuffle reader handler is cached.
After the memory data is flushed into local file, the cached read handler don't recognized this case and ignore it.

But from my side, I don't think your code running in your producation env is the same with the current upstream master code. In current uniffle master code, if the indexFile is not found, it will throw exception, so the log may not the same.

Based the above suppose, I think we just to handle the map.computeIfAbsent(partitionKey, key -> newReadHandler(request)); carefully and you can throw exception if no index file.

zuston · 2024-04-11T03:37:33Z

Please create the issue to track this bug @dingshun3016

rickyma · 2024-04-22T11:32:36Z

Do we still need this?

dingshun3016 · 2024-04-23T06:41:15Z

Thanks for adding detailed logs, I have understood your point, the root cause is to partition shuffle reader handler is cached. After the memory data is flushed into local file, the cached read handler don't recognized this case and ignore it.

But from my side, I don't think your code running in your producation env is the same with the current upstream master code. In current uniffle master code, if the indexFile is not found, it will throw exception, so the log may not the same.

Based the above suppose, I think we just to handle the map.computeIfAbsent(partitionKey, key -> newReadHandler(request)); carefully and you can throw exception if no index file.

After thinking about it, I think throwing exception may cause call getLocalShuffleData to fail. @zuston

fix:read inconsistent when local file read handler init before flushi…

a7d8604

…ng event to disk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix:read inconsistent when local file read handler init before flushing event to disk #1636

fix:read inconsistent when local file read handler init before flushing event to disk #1636

dingshun3016 commented Apr 10, 2024

zuston commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

dingshun3016 commented Apr 10, 2024 •

edited

rickyma commented Apr 10, 2024

dingshun3016 commented Apr 10, 2024

rickyma commented Apr 10, 2024

dingshun3016 commented Apr 11, 2024

rickyma commented Apr 11, 2024

zuston commented Apr 11, 2024 •

edited

zuston commented Apr 11, 2024

rickyma commented Apr 22, 2024

dingshun3016 commented Apr 23, 2024

fix:read inconsistent when local file read handler init before flushing event to disk #1636

Are you sure you want to change the base?

fix:read inconsistent when local file read handler init before flushing event to disk #1636

Conversation

dingshun3016 commented Apr 10, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zuston commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

Test Results

dingshun3016 commented Apr 10, 2024 • edited

rickyma commented Apr 10, 2024

dingshun3016 commented Apr 10, 2024

rickyma commented Apr 10, 2024

dingshun3016 commented Apr 11, 2024

rickyma commented Apr 11, 2024

zuston commented Apr 11, 2024 • edited

zuston commented Apr 11, 2024

rickyma commented Apr 22, 2024

dingshun3016 commented Apr 23, 2024

dingshun3016 commented Apr 10, 2024 •

edited

zuston commented Apr 11, 2024 •

edited