-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on writes/merges #2522
Comments
Can you do the following, change your script to sleep first for 30 secs, then create the table and then sleep again for 30 secs, then start writing in a loop those 50 times? I think the slow increase in resident size might just be because at every write you update the table state at the end of the commit since it includes more info now. |
Merge operations probably holds more info, so this looks normal to me |
@ion-elgreco The for the last checkpoint after running a script with 1000 merges resulting in the following memory increase: The amount the memory increased seems much larger than the metadata |
@echai58 the checkpoint is compressed and also would never translate 1:1 from disk to mem afaik |
@ion-elgreco profiling a script that just instantiates the same delta table gives the following: ~13 mb , which is still much less than the >100mb seen from the merge script |
Environment
Binding: python
Bug
What happened:
We're noticing constantly rising memory in our processes that write to deltalake. I wrote a minimal reproduction that loops and writes to deltalake, and the memory usage seems to indicate a memory leak.
What you expected to happen:
Memory to be reclaimed after writes.
How to reproduce it:
This is my script I tested with:
More details:
Here's the memray graph:
I also tested this with just
write_deltalake(mode="append")
, and the issue seems to also persist:I saw #2068 and tried setting that env var, and got the following (doesn't seem to help):
The text was updated successfully, but these errors were encountered: