Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3966: [Java] Fix default value serialisation for fixed and bytes #2823

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mitasov-ra
Copy link
Contributor

@mitasov-ra mitasov-ra commented Mar 26, 2024

What is the purpose of the change

Default value for fixed and bytes schemas is converted to JSON in the wrong (according to docs) way.

Also, it's possible to store any unicode string as default for bytes, like:

{
  "type" : "record",
  "name" : "TestRecord",
  "fields" : [ {
    "name" : "testFixed",
    "type" : {
      "type" : "fixed",
      "name" : "Code",
      "size" : 3
    },
    "default" : "Привет"
  } ]
}

The above example would be serialised into Avro format as "?????".

In this PR:

  • added explicit conversion from String to byte[] using ISO_8859_1 charset (the one that used during byte[] to JSON conversion and during reading Avro into Java objects);
  • explicitly stated in doc-blocks of methods bytesDefault and fixedDefault that this conversion is performed;
  • byte[] to JSON conversion itself now is done according to doc: by translating bytes to u-escaped sequences (e.g. [2F, 00, 11, FF] into \u002F\u0000\u0011\u00FF)

Verifying this change

This change is already covered by existing tests, such as:

  • org.apache.avro.generic.TestGenericData
  • org.apache.avro.TestFixed

DISCUSSION IS NEEDED

In the test TestGenericData I've had to rewrite test toStringEscapesControlCharsInBytes. The new behaviour encodes "a\nb" as "\u0061\u000A\u0062", thus breaking some compatibility (kinda, more like breaking readability).

It is possible still to not encode ASCII characters (from U+0020 to U+007E) and persist full compatibility with previous version, but then the behaviour would differ from one described in the documentation.

Documentation

  • Does this pull request introduce a new feature? No, it just makes old feature work exactly how it's documented.

@github-actions github-actions bot added the Java Pull Requests for Java binding label Mar 26, 2024
@mitasov-ra
Copy link
Contributor Author

@Fokko there's an issue with checks you've triggered, I can't see logs of failed runs. Can you please rerun one more time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
1 participant