-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122
Comments
Same issue reported here in the past, which is still open for RCA #7800 |
@TarunMootala Is it possible for you to upgrade Hudi version to 0.14.1 and check if you still see this issue. The other issue was related to loading of archival timeline in the sync which was fixed in later releases. #7561 |
@ad1happy2go |
@TarunMootala Can you check the size of the timeline files. Can you post the driver logs. |
@ad1happy2go Attached driver logs |
@TarunMootala The size itself doesn't look so big. In the log I couldn't locate the error. Can you check once. |
When AWS Glue encounters OOME it kills the JVM immediately. It could be reason for the error not being available in driver logs. However, the error is present in output logs which is same as given in overview. |
@TarunMootala Can you share the timeline? Do you know how many file groups are there in the clean instant? |
Can you elaborate on this ?
Are you referring to number of files in that particular cleaner run ? |
Describe the problem you faced
We have spark streaming job that reads data from an input stream and appends the data to a COW table partitioned on subject area. This streaming job has a batch internal of 120 seconds.
Intermittently the job is failing with error
To Reproduce
No specific steps.
Expected behavior
The job should commit the data successfully and continue with next micro batch.
Environment Description
Hudi version : 0.12.1 (Glue 4.0)
Spark version : Spark 3.3.0
Hive version : N/A
Hadoop version : N/A
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
We are not sure on the exact fix and root cause. However, the workaround (not ideal) is to manually delete (archive) few of the oldest Hudi metadata from Active timeline (
.hoodie
folder) and reducehoodie.keep.max.commits
. This is only working when we reduce max commits, and whenever the max commits are reduced it run perfectly for few months before failing again.Our requirement is to store 1500 commits to enable incremental query capability on last 2 days of changes. Initially we started with max commits of 1500 and gradually came down to 400.
Hudi Config
Stacktrace
Add the stacktrace of the error.
Debugged multiple failure logs, always failing at the stage
collect at HoodieSparkEngineContext.java:118 (CleanPlanActionExecutor)
The text was updated successfully, but these errors were encountered: