Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jackson LockFreePool recycler pool causes memory issues #21356

Closed
davseitsev opened this issue Apr 2, 2024 · 14 comments
Closed

Jackson LockFreePool recycler pool causes memory issues #21356

davseitsev opened this issue Apr 2, 2024 · 14 comments

Comments

@davseitsev
Copy link

After upgrade Trino from 409 to 443 we started to have memory issues. Some workers disappear from the cluster due to intensive major GC. In the logs we can see messages like:

INFO    Notification Thread     io.airlift.stats.JmxGcMonitor   Major GC: application 61ms, stopped 9353ms: 91.59GB -> 91.13GB

On the graphs we can see memory starvation:
image

Heap dump analisys shows it comes from JsonUtil#OBJECT_MAPPED_UNORDERED. It creates JsonFactory with default implementation of RecyclerPool which is LockFreePool. This implementation is relatively young, has some concerns and is considered not to be used as default FasterXML/jackson-core#1256

image

We have change the factory below to use JsonRecyclerPools.threadLocalPool()

public static JsonFactory createJsonFactory()
{
return jsonFactoryBuilder().disable(CANONICALIZE_FIELD_NAMES).build();
}

I'm not sure it's the best implementation for Trino but now we can't reproduce the issue.

@hashhar
Copy link
Member

hashhar commented Apr 3, 2024

should be now addressed with #21361.

Can you validate the change @davseitsev if possible since you seem to have a way to reproduce this?

@davseitsev
Copy link
Author

@hashhar I've applied your fix and testing it right now. I will update later.

@davseitsev
Copy link
Author

Looks good, I can't reproduce the issue. I will keep testing and update if anything changes.

@wendigo
Copy link
Contributor

wendigo commented Apr 3, 2024

Closing then. Release should happen this week. Thanks!

@wendigo wendigo closed this as completed Apr 3, 2024
@marvin-roesch
Copy link

This issue appears to still be affecting us on Trino 444. We are running into massive memory issues with this. In our case, heap dumps indicate that JsonTypeUtil.parseJson is at fault.

Some quick debugging shows that the issue is due to the SORTED_MAPPER.writeValue call using the default JsonGenerator instance which in turn gets created by a default JsonFactory that uses the LockFreePool implementation. Until the upstream maybe changes back the default again, it might be needed to perform a more systematic survey of Jackson usages to make sure this doesn't negatively affect more places.

Ideally, all ObjectMapper instances would get created with the customized JsonFactory that contains the configuration.

Alternatively, maybe explicitly downgrading the Jackson libraries is an option?

@wendigo
Copy link
Contributor

wendigo commented Apr 10, 2024

SORTED_MAPPER is using ObjectMapperProvider which configures the JsonFactoryBuilder to use JsonRecyclerPools.threadLocalPool(). if this is ineffective I guess we need to report Jackson issue.

Update: it was fixed in master so not yet been released (part of Airlift 244 update: airlift/airlift@6d36ba3 and 00d867b)

Please test with 445 once it's released.

@marvin-roesch
Copy link

That's great to hear, thanks! I wasn't checking other dependencies' upstream when investigating this. Will report back once 445 is out.

@mariofusco
Copy link

I'm one of the author of the new Jackson pool and I would like to investigate the memory issue it caused, but at the moment I haven't been able to reproduce it on my side. Can anybody please explain how I could reproduce this problem? Thanks.

@mariofusco
Copy link

mariofusco commented Apr 15, 2024

Note that we're discussing this issue also here, so in case you have any suggestion on how to reproduce the problem it would be great to report it also there. In particular it would be very interesting if you could give a look at this comment and let me know if this could be relevant for your environment.

@wendigo
Copy link
Contributor

wendigo commented Apr 15, 2024

@cowtowncoder
Copy link

I am also hoping to work on figuring out the underlying issue. I created a test:

FasterXML/jackson-core#1265

which suggests some abnormal temporary growth in retention, but not quite the smoking gun. So hoping to learn if there is something specific that might trigger or amplify issues.

Another thing that would be good to know -- although I realize it may not be easy to try it out -- is whether use of one of alternate new pools, JsonRecyclerPools.newConcurrentDequePool() -- would show good performance without requiring use of ThreadLocal. The reason I am interested in this choice is that for Jackson 2.17.1 there is need to change the default and choices are reverting back to pre-2.17 ThreadLocal-pool or concurrentDequePool. And it would be great to base decision on observed performance from a real concurrency-heavy use case which I think Trino would qualify as.

@marvin-roesch
Copy link

@wendigo This situation has improved for us now that we're on 445 with the explicit ThreadLocalpool setting. Unfortunately, however, there still appear to places where an ObjectMapper is created without any additional settings, which still surfaces this problem, albeit at a much slower pace.

A heap dump I took indicates JsonInputFunctions is the majority contributor this time around.

Jackson 2.17.1 has been released with a revert to the old pool as default, are there plans to incorporate that into airlift and an upcoming Trino release?

@wendigo
Copy link
Contributor

wendigo commented May 8, 2024

@marvin-roesch jackson is already updated in master, will be released with 447

@wendigo
Copy link
Contributor

wendigo commented May 8, 2024

Follow-up to address Json[Input|Output]Functions ObjectMapper usage: #21867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants