Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle different decimal separators in floating point JSON serialization #528

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

langchr86
Copy link
Contributor

Fixes #201

Based on the discussion in #201 I tried to find a simple but in most of the time not (significant) slower solution for the JSON floating point number serialization/parsing.

Problem:
A floating point number in JSON uses always the "." character as decimal separator (radix) as defined here: https://www.rfc-editor.org/rfc/rfc7159.html#section-6. Because oatpp uses std::strtof and std::strtod for float parsing and snprintf for serialization the implementation does not always use ".". Instead these functions respect the locale set to the process. Therefore in some locales a different character as "." are used. E.g. in "de_*" locations a "," is used. See: https://en.wikipedia.org/wiki/Decimal_separator or https://docs.oracle.com/cd/E19455-01/806-0169/overview-8/index.html.

Solution:
My proposal is based on the check if we are in a locale where the separator is different from the needed ".". This is done by using https://en.cppreference.com/w/c/locale/localeconv to query the currently used separator. If it matches "." nothing is changed in the current implementations.
But if the character is different we try to prepare the to serialize/parse string to match the expected separator character by std::strtof/std::strtod/snprintf. Therefore we replace the locale defined separator with "." after serialization and the received "." by the expected locale defined separator before parsing.

Notes:
This is just a proposal to check if it is worth to continue working on this. Further useful work would consist of:

  • Implement functional unit tests.
  • Implement and document benchmark.
  • Refactor code. E.g. to have the json_decimal_point only defined in one central point.

@langchr86 langchr86 force-pushed the fix/issue201_json_float_with_comma branch from 7501120 to a149d59 Compare December 5, 2021 13:14
src/oatpp/core/utils/ConversionUtils.cpp Outdated Show resolved Hide resolved
src/oatpp/core/utils/ConversionUtils.cpp Outdated Show resolved Hide resolved
src/oatpp/core/utils/ConversionUtils.cpp Outdated Show resolved Hide resolved
@lganzzzo
Copy link
Member

lganzzzo commented Dec 5, 2021

Hello @langchr86 ,

Thanks for PR fixing a long-time bug!

The proposed solution looks good.
The only comment - please move utility methods to json::Utils - since those are json specifics.

Regards,
Leonid

@langchr86 langchr86 force-pushed the fix/issue201_json_float_with_comma branch from a149d59 to f0cb450 Compare December 6, 2021 22:42
@langchr86
Copy link
Contributor Author

langchr86 commented Dec 6, 2021

Thank you for your feedback.

I restructured the added code to have all new functions in the proposed json::Utils class.

I am not sure if I found the correct place where JSON serialization happens. Currently I have identified the ConsistentOutputStream::writeAsString as the place to enhance the serialization behavior.

I also added unit tests for these new functions. They are only executed in a system where the needed locales are available. Maybe we could update the CI environment to meet these requirements or check which matching locales are already available and change the tests accordingly.

Originally I wanted to execute some benchmarks. Do you already have some infrastructure which can be used? Or should I simply write throw-away code for e.g. google-benchmark and paste only the results here? What is your suggestion?

src/oatpp/core/data/stream/Stream.cpp Outdated Show resolved Hide resolved
src/oatpp/core/data/stream/Stream.cpp Outdated Show resolved Hide resolved
src/oatpp/core/parser/Caret.cpp Outdated Show resolved Hide resolved
src/oatpp/core/parser/Caret.cpp Outdated Show resolved Hide resolved
@langchr86 langchr86 force-pushed the fix/issue201_json_float_with_comma branch from f0cb450 to 22b5f01 Compare December 22, 2021 17:34
Copy link
Member

@lganzzzo lganzzzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that the original text isn't modified during the parsing.

src/oatpp/core/parser/Caret.cpp Outdated Show resolved Hide resolved
const char* start = caret.getCurrData();
const char* end = caret.getData() + caret.getDataSize();
// here we remove constness and manipulate the internal memory of Caret directly
parser::json::Utils::convertFirstDecimalSeparatorFromJsonToLocale((p_char8) start, (p_char8) end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do that. Caret is for parsing only, not for modification of data.

Instead, we should do the following:

  • read all chars that are representing possible float value to some buffer ex.: oatpp::String.
  • modify separator char in this independent buffer
  • use std::strtof on that independent buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I now added another commit with a preview of another approach. I will clean up later.
This new approach combines the decimal separator replacing and the string extraction into a separate buffer. Like this we save another loop after the extraction for the decimal separator replacement. Please let me know your thoughts, that I can cleanup and finish later.

src/oatpp/parser/json/mapping/Deserializer.cpp Outdated Show resolved Hide resolved
@lganzzzo
Copy link
Member

Originally I wanted to execute some benchmarks. Do you already have some infrastructure which can be used? Or should I simply write throw-away code for e.g. google-benchmark and paste only the results here? What is your suggestion?

I have some recommendations for performance improvements - but first, let's have the correct implementation for this improvement.

Also, it's understood that this improvement will affect performance a bit - I'm ok with that.

…ion/parsing

This depends on locale settings of process. E.g. in de_* locations
a "," is used instead of ".".

Signed-off-by: Christian Lang <christian.lang@scs.ch>
@langchr86 langchr86 force-pushed the fix/issue201_json_float_with_comma branch from 22b5f01 to 820df3e Compare January 22, 2022 15:01
Copy link
Member

@lganzzzo lganzzzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @langchr86 ,

The latest changes look good to me.

I have some minor things I want to improve -
I've forked your repo and later will create a PR to your branch for you to review.

@rozmansi
Copy link

Any news on this one? Was able to set setlocale(LC_NUMERIC, "C") in my project for the time being, but hoping for a proper fix. Pretty please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decimal comma in Float32 in JSON
3 participants