-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle different decimal separators in floating point JSON serialization #528
base: master
Are you sure you want to change the base?
handle different decimal separators in floating point JSON serialization #528
Conversation
7501120
to
a149d59
Compare
Hello @langchr86 , Thanks for PR fixing a long-time bug! The proposed solution looks good. Regards, |
a149d59
to
f0cb450
Compare
Thank you for your feedback. I restructured the added code to have all new functions in the proposed I am not sure if I found the correct place where JSON serialization happens. Currently I have identified the I also added unit tests for these new functions. They are only executed in a system where the needed locales are available. Maybe we could update the CI environment to meet these requirements or check which matching locales are already available and change the tests accordingly. Originally I wanted to execute some benchmarks. Do you already have some infrastructure which can be used? Or should I simply write throw-away code for e.g. google-benchmark and paste only the results here? What is your suggestion? |
f0cb450
to
22b5f01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that the original text isn't modified during the parsing.
const char* start = caret.getCurrData(); | ||
const char* end = caret.getData() + caret.getDataSize(); | ||
// here we remove constness and manipulate the internal memory of Caret directly | ||
parser::json::Utils::convertFirstDecimalSeparatorFromJsonToLocale((p_char8) start, (p_char8) end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do that. Caret
is for parsing only, not for modification of data.
Instead, we should do the following:
- read all chars that are representing possible
float
value to some buffer ex.:oatpp::String
. - modify separator char in this independent buffer
- use
std::strtof
on that independent buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I now added another commit with a preview of another approach. I will clean up later.
This new approach combines the decimal separator replacing and the string extraction into a separate buffer. Like this we save another loop after the extraction for the decimal separator replacement. Please let me know your thoughts, that I can cleanup and finish later.
I have some recommendations for performance improvements - but first, let's have the correct implementation for this improvement. Also, it's understood that this improvement will affect performance a bit - I'm ok with that. |
…ion/parsing This depends on locale settings of process. E.g. in de_* locations a "," is used instead of ".". Signed-off-by: Christian Lang <christian.lang@scs.ch>
22b5f01
to
820df3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @langchr86 ,
The latest changes look good to me.
I have some minor things I want to improve -
I've forked your repo and later will create a PR to your branch for you to review.
Any news on this one? Was able to set |
Fixes #201
Based on the discussion in #201 I tried to find a simple but in most of the time not (significant) slower solution for the JSON floating point number serialization/parsing.
Problem:
A floating point number in JSON uses always the "." character as decimal separator (radix) as defined here: https://www.rfc-editor.org/rfc/rfc7159.html#section-6. Because oatpp uses
std::strtof
andstd::strtod
for float parsing andsnprintf
for serialization the implementation does not always use ".". Instead these functions respect the locale set to the process. Therefore in some locales a different character as "." are used. E.g. in "de_*" locations a "," is used. See: https://en.wikipedia.org/wiki/Decimal_separator or https://docs.oracle.com/cd/E19455-01/806-0169/overview-8/index.html.Solution:
My proposal is based on the check if we are in a locale where the separator is different from the needed ".". This is done by using https://en.cppreference.com/w/c/locale/localeconv to query the currently used separator. If it matches "." nothing is changed in the current implementations.
But if the character is different we try to prepare the to serialize/parse string to match the expected separator character by
std::strtof
/std::strtod
/snprintf
. Therefore we replace the locale defined separator with "." after serialization and the received "." by the expected locale defined separator before parsing.Notes:
This is just a proposal to check if it is worth to continue working on this. Further useful work would consist of:
json_decimal_point
only defined in one central point.