Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Built-In JSON Serialization is Extremely Slow #6907

Closed
bmahler opened this issue Nov 19, 2019 · 6 comments
Closed

C++: Built-In JSON Serialization is Extremely Slow #6907

bmahler opened this issue Nov 19, 2019 · 6 comments
Assignees

Comments

@bmahler
Copy link

bmahler commented Nov 19, 2019

In the system I work on, we weren't aware of the provided protobuf to json facilities (or they didn't exist so many years ago) and so we built our own protobuf to json conversion that is based on protobuf reflection + rapidjson for the json encoding/serialization. It appears that the provided protobuf to json conversion facility is much slower than our approach.

Here are the median numbers in seconds for 10 runs [1] of serializing a large message:

SerializeAsString: 1.99 seconds
MessageToJsonString: 24.24 seconds
To json via our reflection-based approach: 4.33 seconds

So the built-in facility for protobuf message to serialized json takes 460% longer than our approach, which you can find here:

https://github.com/apache/mesos/blob/1.9.0/3rdparty/stout/include/stout/protobuf.hpp#L810-L1042
Which leverages https://github.com/apache/mesos/blob/1.9.0/3rdparty/stout/include/stout/jsonify.hpp

Is this expected? It seems the protobuf built in facility, at least for going from message to serialized json, is extremely slow and can be much faster.

I suppose another technique that would be even faster than our approach would be for the json serialization logic to be generated by the protobuf compiler on a per message basis (much like protobuf serialization is generated), allowing reflection to be avoided entirely.

Would love to hear any thoughts on comments on this topic.

@aaron-bray
Copy link
Contributor

It looks like 3.11.0 introduced something that really slowed down the JSON parsing in C++
If you use 3.10.1, the performance is great.

@macdew
Copy link

macdew commented Jan 15, 2020

I'll add a "me too" to this. Our JSON parser has gone from about 200ms in 3.10 to 20-odd minutes in 3.11. Size of JSON is about 15MB.

@tredpath
Copy link

For JSON parsing in C++ the slowdown appears to be caused by the addition of StringPiece in JsonStreamParser::GetNextTokenType in #6634. HasPrefixString is called three times which takes std::string as arguments causing the StringPiece to allocate new std::string for each call which is slow for large strings and there are a lot of calls to GetNextTokenType. Changing to StringPiece::starts_with instead of HasPrefixString regains almost all of the lost performance and I believe should be the same comparison.

@bysin
Copy link

bysin commented Apr 22, 2020

It looks like this issue has been fixed in #7230.

@deryck1031
Copy link

doesnt look like #7230 has fixed the MessageToJsonString. I compared older and newer version, performance looks same to me.

@elharo elharo added the c++ label Aug 21, 2021
@elharo elharo added the json label Sep 13, 2021
@mcy
Copy link
Contributor

mcy commented Sep 1, 2022

The JSON codec is not designed to be fast; we do not consider JSON to be the canonical encoding format (you should be using the Protobuf wire format if you need performance, since that's where we focus all of our cycles into).

@mcy mcy closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants