Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON - Long term view of moving away from Jackson to avaje-jsonb (my json library) #2564

Open
rbygrave opened this issue Feb 17, 2022 · 7 comments

Comments

@rbygrave
Copy link
Member

I believe longer term we will move away from using Jackson for JSON parsing and generation for the reasons of - weight, speed and security.

  • Weight - Jackson-databind weights in at 1.5Mb + jackson-core at 375Kb (vs avaje-jsonb at 182Kb)
  • Speed - we can never have enough speed wrt json parsing and generation
  • Security - Jackson has to effectively maintain a blacklist, hence the CVE's and the versions. avaje-jsonb works on a whitelist approach (ie. explicit @Json) and I believe it will prove to be much more stable (very stable).

I've put some effort over the past while into finding a replacement and then failing that ultimately building what I think is going to be the replacement - avaje-jsonb. What does this mean for Ebean?

Ebean internally uses the jackson-core API to support it's json serialization and deserialization + detects and uses Jackson ObjectMapper when necessary. This functionality is also very important for ebean-elastic. Ultimately we can replace this with avaje-jsonb.

The next steps are:

  1. for me to get more people using avaje-jsonb and kicking the tires - hopefully more people see what I see.
  2. At some point after that we will get to the point where we can modify Ebean swapping out the internal use of jackson core with avaje-jsonb (which should be transparent to users of Ebean).
  3. Look at the ObjectMapper use cases and look to support both ObjectMapper and avaje-jsonb for those cases.

This is a pretty long term view, nothing is going to change for a few months yet. I just want people to be aware that I've got this plan in mind and I'm pretty confident in avaje-jsonb now. Ebean's use of json is actually one of the motivators for getting this done. Way way back in the day I contemplated using the then "java jsonp" standard (now "jakarta jsonp") but quite frankly it wasn't good enough then and it isn't good enough now and I don't have faith that "jakarta jsonp" will arrive where I want them to be. Ultimately that lead to avaje-jsonb ended up with it's own internal parser/generator, it also can support and use jackson-core (and jakarta jsonp) but there is little reason to use that in practice now as the built in parser/generator with avaje-jsonb is now faster and lighter than jackson-core - it's still be very useful for performance testing and it's good to have and prove that abstraction in avaje-jsonb.

Please go and try out avaje-jsonb when you get the chance, thanks.
https://github.com/avaje/avaje-jsonb

@jnehlmeier
Copy link

Does avaje-jsonb support custom escaping of values? Or is that something that falls into step 3?

Recently I did something like

    var rows = ... // findList() using QueryBeans;

    var jsonCtx = db.json();
    var jsonWriter = new StringWriter();
    try (var generator = jsonCtx.createGenerator(jsonWriter).setCharacterEscapes(new HtmlCharacterEscapes())) {
      jsonCtx.toJson(rows, generator);
    }
    return jsonWriter.toString();

and it felt a bit clunky. I had to do that because the resulting JSON has been embedded into a HTML page script block and entities in rows contain user input in some fields.

@rbygrave
Copy link
Member Author

rbygrave commented Feb 22, 2022

Does avaje-jsonb support custom escaping of values?

Well no it doesn't - looks like an omission of a useful feature

With avaje-jsonb we have 3 options:

  1. Use avaje-jsonb with jackson-core under the hood for parsing/generation
  2. Use avaje-jsonb with it's own parser/generator under the hood
  3. Not considered: Use avaje-jsonb with Jakarta JSONP under the hood ... just way to slow so not considered worthy.

To explain a little more - avaje-jsonb has an abstraction layer between "binding" and "streaming" meaning it can use jackson-core (or jakarta-jsonp, or it's own parser/generator) under the hood to actually do the generation/parsing. It started off first cut only supporting and using jackson-core under the hood. Having done the next iteration with jakarta jsonp (parrson) and performance testing with comparison to dsl-json and ObjectMapper avaje-jsonb came to have it's own internal "streaming" mechanism as well (for performance and weight reasons).

That said, the ability to change the escaped chars is not exposed via avaje-jsonb (so we should look to make that change PLUS allow the ability to unwrap the underlying parser/generator if needed). As a note: the jakarta libraries as they stand are sooo slooow that I think of them as almost academic. I proposed some PR's to enable performance improvements to them but I can't see those being merged or those eclipse projects moving forward meaningfully in terms of performance.

The other known useful Jackson feature it does not have yet is the mixin feature.


Side note: There is a performance "trick" that we are not going to be able to get when using jackson-core when parsing related to "known keys" (during parsing, identifying known keys without decoding them into String). This trick allows avaje-jsonb to perform near the level of dsl-json (the fastest json java library)

@rPraml
Copy link
Contributor

rPraml commented Mar 28, 2022

Hello Rob, we use jackson a lot, also some other frameworks like Spring rely on this - so it is not an easy task to move away from jackson.

As long as I understand is, that the Ebean-API stays the same, so we still can use Jackson

We have one "uncommon" use case: We produce "human readable" jsons where we

  • indent them
  • do not wrap certain characters like \r \n \t
  • allow comments in JSONs

These Jsons are checked in into GIT and we need the special treatments of line breaks etc, so that PRs of Json changes are reviewable.

This is our ObjectMapper configuration

private static class HumanReadableCharacterEscapes extends CharacterEscapes {
	private static final long serialVersionUID = 1L;

	private static final int[] asciiEscapes = CharacterEscapes.standardAsciiEscapesForJSON();
	static {
		asciiEscapes['\t'] = CharacterEscapes.ESCAPE_NONE;
		asciiEscapes['\r'] = CharacterEscapes.ESCAPE_NONE;
		asciiEscapes['\n'] = CharacterEscapes.ESCAPE_NONE;
	}

	@Override
	public SerializableString getEscapeSequence(final int ch) {
		return null;
	}

	@Override
	public int[] getEscapeCodesForAscii() {
		return asciiEscapes;
	}
}

@Nonnull
public static ObjectMapper configure(@Nonnull final ObjectMapper mapper) {
	// set up all required modules
	mapper.registerModule(new JavaTimeModule()); // the java time module
	final SimpleModule module = new SimpleModule();
	module.addAbstractTypeMapping(Set.class, LinkedHashSet.class);
	mapper.registerModule(module);
	final SimpleModule moduleMap = new SimpleModule();
	moduleMap.addAbstractTypeMapping(Map.class, LinkedHashMap.class);
	mapper.registerModule(moduleMap);
	mapper.setDefaultPropertyInclusion(
			JsonInclude.Value.construct(Include.NON_NULL, Include.USE_DEFAULTS)); // ignore nulls
	return mapper;
}

public static final JsonFactory JSON_FACTORY = new JsonFactory()
		.configure(Feature.ALLOW_COMMENTS, true)
		.configure(Feature.ALLOW_YAML_COMMENTS, true)
		.configure(Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true) // required to read line breaks
		.configure(Feature.ALLOW_TRAILING_COMMA, true);

public static final JsonFactory HUMAN_READABLE_JSON_FACTORY = JSON_FACTORY.copy()
		.setCharacterEscapes(new HumanReadableCharacterEscapes());

// This is the default mapper, that we mainly used.
public static final ObjectMapper DEFAULT_MAPPER = configure(new ObjectMapper(JSON_FACTORY));

// For JSON that should be readable, but meet the JSON spec, we use this mapper
public static final ObjectMapper INDENTING_MAPPER = DEFAULT_MAPPER.copy()
		.enable(SerializationFeature.INDENT_OUTPUT);

// For some special cases (configuration etc. that are editable or stored in a Git-repo) we use this mapper
public static final ObjectMapper HUMAN_READABLE_MAPPER = configure(new ObjectMapper(HUMAN_READABLE_JSON_FACTORY))
		.enable(SerializationFeature.INDENT_OUTPUT);

@rbygrave
Copy link
Member Author

We have one "uncommon" use case: We produce "human readable" jsons where we ...

Not so uncommon I'd say in that this is JSON5 - https://json5.org/

PRs of Json changes are reviewable.

Yes, I also do this. Noting that ebean-test has JsonAssertContains as part of it's functionality. I'm not sure if you are using that or doing asserts against json strings or just reviewing changes in json ... but have a look at JsonAssertContains if you haven't already. I've used it a lot in component tests.

use jackson a lot, also some other frameworks like Spring rely on this

Yes, I realise this. We still will need to support ObjectMapper and in terms of jackson-core use via JsonParser/JsonGenerator these are still usable via avaje-jsonb (avaje-jsonb-jackson abstracts over JsonParser/JsonGenerator) so apps relying on jackson-core should still be supported this way.

That all said, I'm suggesting that my long term view is that Jackson has issues and will not be my preferred json binding or parsing library shortly - I realise it is a serious statement given how embedded it is in the ecosystem and how ubiquitous JSON binding and parsing is. Much in the same way that I see spring as legacy today - not in a bad way in that "it works" etc but in the way that I don't deem it be the best choice today for people deploying into Kubernetes and for me I see Kubernetes as the dominant deployment option going forward.

So support the past and current, but plan and prepare for the future - more "Ahead of Time", less memory consumption, less security vulnerabilities, more stable, simpler ....

Ideally more people see Jackson closer to how I see it but I'm aware that most devs are too busy "getting shit done" to stop and think more critically about infrastructure that today works reasonably well. So yes, I expect a lot of push back on avaje-jsonb and ultimately we need to support ObjectMapper use regardless.

@rbygrave
Copy link
Member Author

Not so uncommon I'd say in that this is JSON5

Well that isn't correct really. I'd say this is similar to JSON5 wrt support for multi-line string content and comments.

@rbygrave rbygrave pinned this issue Apr 6, 2022
@rbygrave
Copy link
Member Author

HtmlCharacterEscapes

jsonCtx.createGenerator(jsonWriter).setCharacterEscapes(new HtmlCharacterEscapes()))

With avaje-jsonb I've added @Json.Raw which we can put on a String property that can contain arbitrary json content. This is then read and written (as a string containing raw json).

This is similar to Jackson @JsonRawValue except that the jackson feature only works for writing and the avaje-jsonb one works for both writing and reading. Note that when using jackson-core under the hood it uses parser.getCodec().readTree(parser).toString(). When using the built-in streaming parser/generator it effectively uses the skipValue() mechanism to find the start and end of the content.

I'm saying this because this could be an alternative approach for use cases needing HtmlCharacterEscapes().

@rbygrave
Copy link
Member Author

rbygrave commented Apr 21, 2022

Adds example showing use with custom Jackson JsonFactory - avaje/avaje-jsonb#20 ... your use cases @rPraml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants