Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Two Way Handling for Standard Json #249

Merged
merged 4 commits into from Aug 19, 2022
Merged

Conversation

xmcqueen
Copy link
Contributor

per #247

I've added support for going from avro-encoded to standard json (no decoration of unions). We already had the ability to go from standard json to avro-encoded data. So this makes it support the full two-way flow of data. The old code is still in there and unchanged. To get the two-way functionality, use something like this to construct a codec:

func NewCodecForStandardJsonFull(schemaSpecification string) (*Codec, error) {
        return NewCodecFrom(schemaSpecification, &codecBuilder{
                buildCodecForTypeDescribedByMap,
                buildCodecForTypeDescribedByString,
                buildCodecForTypeDescribedBySliceTwoWayJson,
        })
}

I've also added a new function for anybody who might want the one-way, giving it a more obvious name:

func NewCodecForStandardJsonOneWay(schemaSpecification string) (*Codec, error) {
        return NewCodecForStandardJSON(schemaSpecification)
}

@mihaitodor
Copy link
Contributor

This is just what I was looking to propose adding to goavro, so thank you for implementing it! I guess it superseeds #201, right?

Regarding the code, here are a few small nitpicks:

  • Rename NewCodecForStandardJsonOneWay -> NewCodecForStandardJSONOneWay
  • Rename NewCodecForStandardJsonFull -> NewCodecForStandardJSONFull
  • Similar to NewCodec, consider adding some comments above NewCodecForStandardJSON, NewCodecForStandardJSONOneWay and NewCodecForStandardJSONFull to explain what they do highlighting the differences between them

Otherwise, it looks great! I tested it with a certain "problematic" logical type and it does emit the expected JSON:

package main

import (
	"testing"

	"github.com/linkedin/goavro/v2"
	"github.com/stretchr/testify/require"
)

func TestLogicalTypeToJSON(t *testing.T) {
	schema := `
{
	"type": "record",
	"name": "bytesdecimal",
	"fields": [
		{
			"default": null,
			"name": "pos_0_33333333",
			"type": [
				"null",
				{
					"logicalType": "decimal",
					"precision": 16,
					"scale": 2,
					"type": "bytes"
				}
			]
		}
	]
}`

	jsonString := `
{
	"pos_0_33333333": "!"
}`

	codec, err := goavro.NewCodecForStandardJsonFull(schema)
	require.NoError(t, err)

	native, _, err := codec.NativeFromTextual([]byte(jsonString))
	require.NoError(t, err)

	bs, err := codec.TextualFromNative(nil, native)
	require.NoError(t, err)

	require.Equal(t, `{"pos_0_33333333":"!"}`, string(bs))
}

I did however open #253 to see if there's any interest to support additional logical type conversions, similar to Java.

@xmcqueen
Copy link
Contributor Author

Thanks for the review and the testing. I was not aware of the previous similar PR. I do want to use this one as it is designed to be an extension of the one way json that already in place. I was also not aware of that older issue. So it does look like we should get this merged and close those older issues.

I'll get this updated per your comments and merged soon. Maybe we'll get some more reviews son. Thanks again.

@mihaitodor
Copy link
Contributor

Sounds good to me, thank you!

@loicalleyne
Copy link

@xmcqueen I tested your code with some OCF files we use in production that use primitive and complex types (no logical types) and the issue of type literals being added to data is no longer present when using TextualFromNative.

@xmcqueen
Copy link
Contributor Author

excellent. I'll get this thing going.

@xmcqueen
Copy link
Contributor Author

OK I've done the renaming and commenting suggested by @mihaitodor. Thanks y'all for the contributions and reviews

@xmcqueen xmcqueen merged commit 442327d into linkedin:master Aug 19, 2022
@mihaitodor
Copy link
Contributor

mihaitodor commented Aug 19, 2022

Awesome, thank you! 🎉

@karrick
Copy link
Contributor

karrick commented Oct 27, 2022

I believe this might have broken ability to properly encode and decode Avro data with union data types.

The Avro specification is clear about the requirement to encode a union in JSON using a JSON object whose single property name is the name of the data type being encoded, in this case, "string", and the property value is the value being encoded. This change breaks compatibility, and should be backed out.

@mihaitodor
Copy link
Contributor

mihaitodor commented Oct 27, 2022

Is this related to #252? That should be fixed too, but I haven't gotten a chance to dig into it. It will require a major version bump and I'd like to also address #253 as part of that...

We already used NewCodecForStandardJsonFull in https://github.com/benthosdev/benthos and I'd like to better understand what can be done to retain this functionality and address the incompatibility you found. Would you be able to provide an example?

@xmcqueen
Copy link
Contributor Author

Hi @karrick!

Currently there are 3 ways to get in and out of the avro world in here, with varying levels of expected compatibility.

  1. full strict avro spec compliance via the original API which remains in place
  2. full compatibility but undecorated input-only json is converted/coerced: available to coders via NewCodecForStandardJsonOneWay
  3. uncertain compatibility 2 way undecorated json: NewCodecForStandardJSONFull

I did try to get option 3 to be fully compatible, and it does seem to pass tests. It is certainly convenient and popular. For people who require strict compliance, go-avro is expected to still provide it at the same standard upheld before, via the same API used before. If there are problems with option 3 I think some of the users would be happy to see that get patched, if they can retain the convenient ingress/egress to the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants