Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Avro with specified reader and writer schemas #34

Closed
pszymczyk opened this issue Dec 7, 2016 · 5 comments
Closed

Reading Avro with specified reader and writer schemas #34

pszymczyk opened this issue Dec 7, 2016 · 5 comments
Labels
Milestone

Comments

@pszymczyk
Copy link

Hi

In origin avro-tools when I read some data I can specify reader and writer schemas. It is related with backward/forward compatibility:

SpecificDatumReader r = new SpecificDatumReader(writer, reader);
BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inputStream, null);
return r.read(null, decoder);

Is it possible to specify writer and reader schema in AvroMapper?

@cowtowncoder
Copy link
Member

Not currently. I am not sure what I think of ability to do this, although if suitable API could be specified this might make sense.

@pszymczyk
Copy link
Author

pszymczyk commented Dec 7, 2016

My case is that in our company we have implemented some generic SerDe library built on the top of origin Avro tools. Now we'd like to migrate to Jackson but we can't cause it does not expose method with reader and writer schema :(.

@cowtowncoder
Copy link
Member

@pszymczyk I guess my question there is as to value of separate schemas. Since Jackson data-binding is somewhat more flexible, it often should be enough to just use the writer schema that was used for generation? And since this is always required to be present, value of second schema (to rename fields, essentially) may not be that high.

On API: the main challenge is just that of how to pass it via general Jackson API. I suspect that if it is necessary, it would be possible to modify AvroSchema wrapper (or whatever the name was) to allow secondary schema, and so maybe support this quite seamlessly?
Avro parser and generators must know about handling of AvroSchema anyway, so this might be relatively simple change actually.

If you happened to have time and interest, I would be happy to help getting it integrated via PR.

@cowtowncoder
Copy link
Member

@panchenko do you happen to have a simple case of reader+writer schema that I could use? I think I should be able to implement this quite easily after all.

@cowtowncoder cowtowncoder changed the title [avro] Reading Avro with specified reader and writer schemas Reading Avro with specified reader and writer schemas Jan 17, 2017
@cowtowncoder cowtowncoder removed the 2.9 label Feb 13, 2017
@cowtowncoder cowtowncoder added this to the 2.8.7 milestone Feb 13, 2017
@cowtowncoder
Copy link
Member

This is now implemented for upcoming 2.8.7. Usage is via AvroSchema, with something like:

AvroSchema writerSchema = mapper.schemaFrom(src);
AvroSchema resolving = writerSchema.withReaderSchema(
     mapper.schemaFrom(srcForReaderSchema));

and then just using given schema.

Note on performance: while there may be some overhead for schema evolution there should not be significant differences during parsing -- most of the work is done during construction of "resolving" schema definition. Because of this it is strongly recommended that schemas are reused: instances are thread-safe so they can be safely shared and reused across threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants