New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Built-In JSON Serialization is Extremely Slow #6907
Comments
It looks like 3.11.0 introduced something that really slowed down the JSON parsing in C++ |
I'll add a "me too" to this. Our JSON parser has gone from about 200ms in 3.10 to 20-odd minutes in 3.11. Size of JSON is about 15MB. |
For JSON parsing in C++ the slowdown appears to be caused by the addition of StringPiece in JsonStreamParser::GetNextTokenType in #6634. HasPrefixString is called three times which takes std::string as arguments causing the StringPiece to allocate new std::string for each call which is slow for large strings and there are a lot of calls to GetNextTokenType. Changing to StringPiece::starts_with instead of HasPrefixString regains almost all of the lost performance and I believe should be the same comparison. |
It looks like this issue has been fixed in #7230. |
doesnt look like #7230 has fixed the MessageToJsonString. I compared older and newer version, performance looks same to me. |
The JSON codec is not designed to be fast; we do not consider JSON to be the canonical encoding format (you should be using the Protobuf wire format if you need performance, since that's where we focus all of our cycles into). |
In the system I work on, we weren't aware of the provided protobuf to json facilities (or they didn't exist so many years ago) and so we built our own protobuf to json conversion that is based on protobuf reflection + rapidjson for the json encoding/serialization. It appears that the provided protobuf to json conversion facility is much slower than our approach.
Here are the median numbers in seconds for 10 runs [1] of serializing a large message:
SerializeAsString: 1.99 seconds
MessageToJsonString: 24.24 seconds
To json via our reflection-based approach: 4.33 seconds
So the built-in facility for protobuf message to serialized json takes 460% longer than our approach, which you can find here:
https://github.com/apache/mesos/blob/1.9.0/3rdparty/stout/include/stout/protobuf.hpp#L810-L1042
Which leverages https://github.com/apache/mesos/blob/1.9.0/3rdparty/stout/include/stout/jsonify.hpp
Is this expected? It seems the protobuf built in facility, at least for going from message to serialized json, is extremely slow and can be much faster.
I suppose another technique that would be even faster than our approach would be for the json serialization logic to be generated by the protobuf compiler on a per message basis (much like protobuf serialization is generated), allowing reflection to be avoided entirely.
Would love to hear any thoughts on comments on this topic.
The text was updated successfully, but these errors were encountered: