Optimize `JsonNodeDeserialization` wrt recursion #3397

cowtowncoder · 2022-02-13T22:19:42Z

(note: cleaved off of #2816, used to be bundled)

Current implementation JsonNodeDeserialization is expensive for deeply nested Object and Array values as it uses recursion: so for each small additional nesting level -- for arrays, 2 bytes to encode [ and ] -- a new stack frame gets created.
In practical terms this means that it is possible to exhaust JVM heap usage with document that has nesting in order of ten thousand(s) levels, depending on settings.

It should be possible to replace basic recursion, however, with iteration, to at least significantly reduce amplification: to prevent cheapest potential DoS concerns.

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2022-02-13T22:21:55Z

Note: implementation was merged from branch exp/iterative-jsonnode-2.13 (hash f93fd41028b6efcc7c41401dd271aa7d81da6cf3 ?), included in release 2.13.0 -- this issue added retroactively for tracking purposes.

deniz-husaj · 2022-04-04T10:40:18Z

Same issue occurs when using e.g. ObjectMapper.readTree(InputStream in) where in is a JSON String with plain opening and closing square brackets e.g. "[[[[[]]]]]". But with a depth of 50 million nested Arrays. It will take very long time until finishing deserialization or throwing an Exception (actually with 50 million I never reached the ending).

Looks like the recursion in BaseNodeDeserializer._deserializeContainerNoRecursion is the reason for that. Reproducable with jackson-databind=2.13.2.2.

I assume this is related to this issue or should I open a new bug issue for that? Will the provided fix cover that part as well?

yawkat · 2022-04-04T10:51:09Z

@denizhusaj #3416 (which fixed the CVE #2816) specifically fixed the StackOverflowError that can be caused by deeply nested JSON. This is not an issue for JsonNode anymore because in 2.13.0 the JsonNode impl was changed to be iterative.

However even the iterative implementation can still take a while if you feed it 50m arrays. It is simply a lot of tokens :)

deniz-husaj · 2022-04-04T11:35:14Z

@yawkat So there are no plans to forbid deeply nested JSON Arrays?

But somehow I still get a StackOverflowError when having deeply nested JSON Objects with input streams like {"abc0": {"abc1": {"abc2": {"abc3": {"abc4": {}}}}}} but with a depth e.g. of 50.000. Is this expected? Same scenario like in my comment before.

yawkat · 2022-04-04T11:40:54Z

@denizhusaj i cannot reproduce that issue. I tried a nested JsonNode object with 50000 levels like in your example. It parsed just fine.

Maybe your error comes from JsonNode.toString? That will still error, however that is more of a debugging method.

deniz-husaj · 2022-04-04T12:11:11Z

@yawkat yes sorry you are right, it comes from the toString()...

Method threw 'java.lang.StackOverflowError' exception. Cannot evaluate com.fasterxml.jackson.databind.node.ObjectNode.toString()

But regarding deep nested JSONArrays there will be no depth limit?

yawkat · 2022-04-04T12:23:21Z

I can't say that, it's tatu's decision.

However I'm not convinced a depth limit would help with "long texts take a long time to parse" completely. In general you can also allocate a lot of objects without very deep json, e.g. [[[... 4000 levels...]],[[... 4000 levels...]],... thousands of repetitions...]. This will not run into the depth limits, but will still be fairly slow to parse (simply because there's lots of tokens).

There is one problem that is unique to deeply nested json in particular (as opposed to other ways of getting many tokens): This line limits the expansion of the ContainerStack to max 4000 elements, which means that you can get quite large allocations for every 4000 tokens. However there are always at most two of these arrays alive at a time, so it should not lead to overly large memory use, so it should not be a security risk. It does however reduce perf of parsing of that particular document.

cowtowncoder · 2022-04-04T18:56:19Z

@denizhusaj Could you file a separate issue for ObjectNode.toString() please? That sounds like a side issue that sounds worth addressing.

As to plans: yes, there is a plan but it'd be via lower level streaming API:

FasterXML/jackson-core#637

since handling it for all distinct deserializers is more work, configurability, so this aspect (maximum input document size/complexity limits) seems better addresses with more general functionality.
That said, I have not started work here; and while conceptually simple, actual high-performance implementation is not trivial, and there's a bit of API work to consider as well (wrt how to pass such configuration limit settings).
API/performance comes into play when passing limits to parser/generator input/output stack implementations; there is no way to currently pass such info.

deniz-husaj · 2022-04-05T08:26:37Z

@cowtowncoder yes sure #3447

cowtowncoder added this to the 2.13.0 milestone Feb 13, 2022

cowtowncoder mentioned this issue Feb 13, 2022

Optimize UntypedObjectDeserializer wrt recursion [CVE-2020-36518] #2816

Closed

cowtowncoder added a commit that referenced this issue Feb 13, 2022

Add entry wrt #3397 fix included in 2.13.0 release

1c6a227

cowtowncoder added a commit that referenced this issue Feb 13, 2022

Update existing test to refer to #3397 as well.

50cb997

robelcik mentioned this issue Mar 14, 2022

Enhancement: Upgrade Jackson to 2.12.6 or 2.13.1 payara/Payara#5641

Closed

yawkat mentioned this issue Feb 21, 2023

Stack overflow (50083) found by OSS-Fuzz FasterXML/jackson-dataformats-text#387

Closed

cowtowncoder removed the 2.13 label Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `JsonNodeDeserialization` wrt recursion #3397

Optimize `JsonNodeDeserialization` wrt recursion #3397

cowtowncoder commented Feb 13, 2022

cowtowncoder commented Feb 13, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

cowtowncoder commented Apr 4, 2022

deniz-husaj commented Apr 5, 2022

Optimize JsonNodeDeserialization wrt recursion #3397

Optimize JsonNodeDeserialization wrt recursion #3397

Comments

cowtowncoder commented Feb 13, 2022

cowtowncoder commented Feb 13, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

deniz-husaj commented Apr 4, 2022

yawkat commented Apr 4, 2022

cowtowncoder commented Apr 4, 2022

deniz-husaj commented Apr 5, 2022

Optimize `JsonNodeDeserialization` wrt recursion #3397

Optimize `JsonNodeDeserialization` wrt recursion #3397