InternCache - replace synchronized with `ReentrantLock` #1251

pjfanning · 2024-03-27T18:50:43Z

If we are going all-in on ReentrantLocks - I would still like to do each change its own PR so that each change gets looked at on its own merits.

cowtowncoder · 2024-03-27T20:37:17Z

I am fine with this if there's something to suggest performance is better (even if just for newer JVMs).

pjfanning · 2024-03-27T20:41:25Z

I am fine with this if there's something to suggest performance is better (even if just for newer JVMs).

@cowtowncoder so you need micro benchmarks to prove this?

cowtowncoder · 2024-03-27T20:43:26Z

@pjfanning I'd be fine with tests done for similar use case somewhere else; article etc.
Just in case there are counter-examples. Or more like sanity check so we aren't just making things worse with no benefit.

So no, I don't absolutely require one specifically for this change.

Although... if easy enough, it'd of course be great and this case might be easy enough.

pjfanning · 2024-03-27T20:51:43Z

@cowtowncoder I can guarantee that you can write use cases where you can get worse performance (older JDKs - and probably also in newer JDKs if you don't use virtual threads) and better performance (virtual threads in latest JDKs). I think it is fairly pointless. We need to make a decision whether we want to make VirtualThread users happy while imposing a small penalty on all other users - or whether we want to code so that everyone is happy by making this configurable.

cowtowncoder · 2024-03-27T23:06:29Z

@pjfanning Ok I was hoping for "tends to be no-slower and possibly faster on newer JVMs" kind of general results. Or even just "not meaningfully slower", given it's useful for virtual threads.

This particular thing is difficult to wire so I am not really looking forward to doing that.

With that... yeah, jmh based test for 2 cases would be great, perhaps for

https://github.com/FasterXML/jackson-benchmarks/

with whatever threads (100?), interning a repeating set of Strings.

We don't have to decide on this right today either, can keep this open to discuss etc.
I only hesitate because this is potential hot spot for performance.

cowtowncoder · 2024-03-27T23:07:44Z

Wow. This thing ("ReentrantLock vs synchronized") has been around for a while, so many hits from 10 years ago :)

cowtowncoder · 2024-03-28T01:16:52Z

Hmmh. Not much useful info in articles like:

https://medium.com/javarevisited/synchronized-vs-reentrantlock-how-to-choose-cfb5306080e7

which basically suggests synchronized fine except for heavily-contested cases.
Which may or may not be the case here -- in normal use case InternCache is only called when symbol table lookup fails, which is a sign of poor JsonFactory reuse.
(it is also called for JsonDeserializers by databind -- but similarly deserializers are cached so should be light load).

Then again, thinking out aloud: since it only matters in high-contention case, and rarely for "normal" (good) usage, there is little downside to doing this. So I am leaning towards merging this change: we do get reports of high lock contention cases from time to time.

pjfanning · 2024-03-28T16:04:53Z

I've done some testing in my own jmh project - https://github.com/pjfanning/jackson-nest-bench/blob/main/src/jmh/java/org/example/jackson/bench/InternCacheBench.java

With 5 threads, I'm finding that the InternCache with ReentrantLock calls like in this PR is slower than synchronized.

I did make a change to use tryLock instead so that if another thread has the lock, then the current thread skips the clear section. This runs faster. It does mean that under heavy contention that the cache can grow past the max size but that before too long some thread will get the lock and clear down the cache.

It would take a pretty extraordinary set of circumstances for the cache size to grow a large amount over the max size. If this was a real worry, we could do something like this (I don't think this extra complexity is warranted).

// GUIDELINE_MAX_ENTRIES = 180
// ABSOLUTE_MAX_ENTRIES = 200
// in new code, we should never get more than ABSOLUTE_MAX_ENTRIES but once we get to GUIDELINE_MAX_ENTRIES
// we do try to clear() but won't wait to get a block unless we've reached the ABSOLUTE_MAX_ENTRIES
        if (size() >= GUIDELINE_MAX_ENTRIES) {
            if (_lock.tryLock()) {
                try {
                    if (size() >= GUIDELINE_MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    _lock.unlock();
                }
            } else if (size() >= ABSOLUTE_MAX_ENTRIES) {
                _lock.lock();
                try {
                    if (size() >= GUIDELINE_MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    _lock.unlock();
                }
            }
        }

cowtowncoder · 2024-03-28T17:46:19Z

yeah no let's keep things simple. Thank you for checking out performance aspects with jmh benchmark.

I think I'm ok with slower operation for low concurrency cases.

One ask: could you add an entry in release-notes/VERSION-2.x for this change (ok to refer to PR)?

pjfanning · 2024-03-28T17:53:52Z

@cowtowncoder this is the simpler change that I would like to make to this PR. It can let the cache grow a little bit bigger than the max but it is faster than the existing change in this PR.

        if (size() >= MAX_ENTRIES) {
            if (lock.tryLock()) {
                try {
                    if (size() >= MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    lock.unlock();
                }
            }
        }

cowtowncoder · 2024-03-28T18:03:19Z

@pjfanning yes that's fine; max size enforcement can be approximate, and if this is faster for common case, good.

Just let me know when this is ready (I guess once converted to non-Draft), will merge.

pjfanning · 2024-03-28T18:20:52Z

@cowtowncoder ready for review

cowtowncoder

Ahhh. And I even forgot that this only applies to rather rare case of flushing full cache.
So even less impactful for most use cases.

Approved.

cowtowncoder · 2024-03-28T18:35:20Z

Will also increase default size from 100 to 200, see #1257. Seems rather low, for global cache. Possibly relevant for abnormal cases of no/low symbol-table reuse.

zshao9 · 2024-04-23T23:10:48Z

@cowtowncoder @pjfanning When will we make a release that includes this patch? This is a crucial perf improvement that we are looking forward to!

cowtowncoder · 2024-04-24T05:49:56Z

@zshao9 Original intent was to add this in 2.18.0; but since it appears safe, I think we could consider backporting into 2.17.1.

But I am curious as to whether this really is important: have you measured its impact with load testing? My experience has been that finding this path in code as hot spot tends to indicate different problem altogether -- specifically lack of reuse of JsonFactory and/or ObjectMappers.

cowtowncoder · 2024-04-24T05:50:49Z

@pjfanning WDYT? If we think this is good patch (I think so) and safe, maybe it should be backported in 2.17.1, which I hope to release next week.

pjfanning · 2024-04-24T11:09:11Z

v2.17.1 already has a lot of changes. Without an actual description of why this is actually affecting anyone in a serious way, I think we should wait till v2.18.0.

cowtowncoder · 2024-04-26T04:24:26Z

@pjfanning Yes, valid point. Will wait for actual supporting evidence before considering backporting.

zshao9 · 2024-04-26T18:09:56Z

@cowtowncoder @pjfanning Our main use case is in an Apache Spark executor where 32 threads (on 32 vcores) can be busy parsing large JSON documents with Spark's UDF called GET_JSON_OBJECT, which internally uses jackson-core. We have seen that 31 out of 32 threads are waiting for the lock while 1 thread is holding it! I would say this is a huge performance issue for us.

Reference: https://issues.apache.org/jira/browse/SPARK-47959

cowtowncoder · 2024-04-26T20:20:25Z

@zshao9 Ok but more importantly, is there a way to validate that the change would help? Maybe could test with 2.18.0-SNAPSHOT of jackson-core that has it? (build locally or use Sonatype OSS repo's auto-built).
Without some form of validation this is speculative fix and as such not worth backporting into patch update (but good enough for minor version as we do not think there are negative effects).

But equally importantly: this could be a sign of lack of reuse of ObjectMapper (most common) -- or directly not reusing JsonFactory. In both cases the problem is that the underlying symbol table is not getting reused (reuse would prevent need to re-intern() Strings once decoded from byte[] or char[]); but since InternCache is global that can get contention.
But reducing that contention is only small part of significant performance gain of properly reusing underlying JsonFactory (or more specifically, symbol table it uses); either directly or indirectly via ObjectMapper.

zshao9 · 2024-04-28T04:17:58Z

@vadim has helped us to disable Jackson Field Name Interning, and the problem disappeared entirely.

But equally importantly: this could be a sign of lack of reuse of ObjectMapper (most common) -- or directly not reusing JsonFactory.

I am not familiar with the logic to decide what keys to cache and intern. What if the keys in the Json Object are non-repeated UUID strings?

cowtowncoder · 2024-04-28T17:34:17Z

@zshao9 Setting is global to JsonFactory so you have only decision for whether to canonicalize and/or intern everything or not.

Since you hare having performance problems, I would recommend first turning of canonicalization and seeing how that affects performance; and then turning of interning and doing the same.
Sounds like you have done that.
But I think in addition to disabling interning, you might want to also see about disabling canonicalization; that might be even better option.

Basically there are 2 main possibilities as to why you are having issues with concurrency:

Your symbol tables are not being reused -- either JsonFactory is not reused, or JsonParsers are not being close()d (they are automatically closed if reading the end-of-input; otherwise need to call parser.close().
Number of symbols in documents is huge -- symbol table maximum size is about 6,000 -- and they do not repeat (at all, or not a lot: UUIDs are unique, for example)

In first case, your real problem would be solved by reuse or ensuring closing of parsers and NOT changing interning/canonicalization. In second case, you would benefit from changing these settings.

I have yet to see a case where locking in InternCache is not based on one of (1) or (2): it really is a symptom usually.

vadim · 2024-04-30T16:35:42Z

Hey @zshao9 and @cowtowncoder -- As @JoshRosen mentioned here, it does not seem like the option to disable canonicalization is present in jackson-core==2.15.2. I believe his experiments showed no effect when trying to toggle this on 2.15.2, while interning removed the bottleneck faced by @zshao9 . Any thoughts here on versioning?

cowtowncoder · 2024-04-30T18:42:10Z

@vadim Both options have been around for very long time, without many changes.
So I don't think Jackson version should matter a lot here.

Difference between canonicalization disabling (no help?) and interning (helps) is interesting -- not sure when/how that would happen.
Either way symbol table use is likely not working optimally for the use case.

I would then go with disabling interning since that helps. Improvement listed here will go in 2.18.0 but in general is unlikely to be super important compared to other options (IMO).

IntenCache - remove synchronized

e68a512

pjfanning changed the title ~~IntenCache - remove synchronized~~ InternCache - remove synchronized Mar 27, 2024

cowtowncoder approved these changes Mar 28, 2024

View reviewed changes

pjfanning marked this pull request as draft March 28, 2024 17:53

Merge branch '2.18' into interncache-lock

59b3089

more performant change

eee61be

pjfanning marked this pull request as ready for review March 28, 2024 18:19

cowtowncoder approved these changes Mar 28, 2024

View reviewed changes

cowtowncoder merged commit 1fd8cb1 into FasterXML:2.18 Mar 28, 2024
5 checks passed

cowtowncoder changed the title ~~InternCache - remove synchronized~~ InternCache - replace synchronized with ReentrantLock Mar 28, 2024

pjfanning deleted the interncache-lock branch March 28, 2024 18:29

cowtowncoder mentioned this pull request Mar 28, 2024

Increase InternCache default max size from 100 to 200 #1257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InternCache - replace synchronized with `ReentrantLock` #1251

InternCache - replace synchronized with `ReentrantLock` #1251

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024 •

edited

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

cowtowncoder commented Mar 28, 2024 •

edited

pjfanning commented Mar 28, 2024

cowtowncoder commented Mar 28, 2024

pjfanning commented Mar 28, 2024 •

edited

cowtowncoder commented Mar 28, 2024

pjfanning commented Mar 28, 2024

cowtowncoder left a comment

cowtowncoder commented Mar 28, 2024

zshao9 commented Apr 23, 2024

cowtowncoder commented Apr 24, 2024

cowtowncoder commented Apr 24, 2024

pjfanning commented Apr 24, 2024

cowtowncoder commented Apr 26, 2024

zshao9 commented Apr 26, 2024

cowtowncoder commented Apr 26, 2024

zshao9 commented Apr 28, 2024

cowtowncoder commented Apr 28, 2024

vadim commented Apr 30, 2024

cowtowncoder commented Apr 30, 2024

InternCache - replace synchronized with ReentrantLock #1251

InternCache - replace synchronized with ReentrantLock #1251

Conversation

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024 • edited

pjfanning commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

cowtowncoder commented Mar 27, 2024

cowtowncoder commented Mar 28, 2024 • edited

pjfanning commented Mar 28, 2024

cowtowncoder commented Mar 28, 2024

pjfanning commented Mar 28, 2024 • edited

cowtowncoder commented Mar 28, 2024

pjfanning commented Mar 28, 2024

cowtowncoder left a comment

Choose a reason for hiding this comment

cowtowncoder commented Mar 28, 2024

zshao9 commented Apr 23, 2024

cowtowncoder commented Apr 24, 2024

cowtowncoder commented Apr 24, 2024

pjfanning commented Apr 24, 2024

cowtowncoder commented Apr 26, 2024

zshao9 commented Apr 26, 2024

cowtowncoder commented Apr 26, 2024

zshao9 commented Apr 28, 2024

cowtowncoder commented Apr 28, 2024

vadim commented Apr 30, 2024

cowtowncoder commented Apr 30, 2024

InternCache - replace synchronized with `ReentrantLock` #1251

InternCache - replace synchronized with `ReentrantLock` #1251

cowtowncoder commented Mar 27, 2024 •

edited

cowtowncoder commented Mar 28, 2024 •

edited

pjfanning commented Mar 28, 2024 •

edited