You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're wanting to use KSQL to perform some stream queries on many of our Kafka topics, but we're having some trouble, and were hoping some of you kind folks could help us out.
Our current setup is that we have nested .avsc files which define our schema.
We use the avro-maven-plugin to convert the avsc file into POJOs.
Our application services (Kafka producers) then create POJOs put the POJOs onto various Kafka topics.
We're using serdes which look like this...
public class OurClassSerde extends WrapperSerde<OurClass>
{
public OurClassSerde()
{
super(new AvroSerializerBase<OurClass>(), new OurClassDeserializer());
}
}
clazz is the class of the POJO we're deserialising.
As far as I'm aware this is not using wire-format. This may be the problem.
This all works, and has been functioning for years in its own right.
We now want to use KSQL, as we can see a very large increase in the amount of data we need to process, and stream processing will scale better than running aggregations on static data.
We're running in Docker (Docker Desktop for dev purposes, backed by WSL2), and have used the recommended zookeeper / kafka / KSQLSB / KSQL CLI / schema-registry compose file. Just in case that makes a difference.
We've created a stream using...
CREATE STREAM teststream WITH (KAFKA_TOPIC='TopicOurData', VALUE_FORMAT='AVRO');
This doesn't fail.
When we describe the stream, it has correctly identified all of the sub fields, and sub-sub fields.
As soon as we create a query to read from the stream, all of the data is instantly consumed, along with the following error...
The error we're seeing makes me think that we're not using wire-format.
If we try to manually specify the shape of the data using a STRUCT, we get the same error.
We have a fundamental problem with using the schema-registry, as when we set it up, it doesn't seem to "support" nested schemas.
I can't find a way of manually creating a schema in a single JSON object which is accepted by schema-registry. We can create a file which the avro-maven-plugin will use, but schema-registry doesn't like it.
If we un-nest the schema, and specify the sub-sub fields, sub-fields and fields (as it looks in the avsc example above), when we add the file to schema-registry it accepts it, but when we create a stream, it adds all sub-sub-fields, then sub-fields, then the actual object last of all. It doesn't feel like that is correct.
We have so many questions!
Is the problem we're seeing (Unknown Magic Byte!) definitely down to the fact that we're not using the wire-format?
As the POJO data we are inserting into Kafka has a built-in schema, and seems to be recognised by KSQL, do we need schema-registry? We have the schemas in one place, in one file, which we use to create the POJOs. If we can get the topic to auto-register the schema from the POJOs, that's a relief for us, as we only need to maintain a single point of truth.
If we need schema registry, how can we import our single AVSC file (which contains all shared sub-sub fields, sub fields, and fields). If we have to have one file per top-level object, how can we import the sub-field and sub-sub field objects in a generic manner, so each one isn't associated specifically to a topic, but is just present?
Is there a Kafka provided Java class to perform wire-format serialisation and deserialisation, or do we have to pack the magic byte and 4 bytes of version ourselves, manually? If we need to, this is fine, but if there's built-in functionality there, I'd rather we used it.
If anyone can help us out here, we'd be extremely grateful.
Thanks.
The text was updated successfully, but these errors were encountered:
Hi everyone.
I'm a newcomer to KSQL, but not to Kafka.
We're wanting to use KSQL to perform some stream queries on many of our Kafka topics, but we're having some trouble, and were hoping some of you kind folks could help us out.
Our current setup is that we have nested .avsc files which define our schema.
These are in the general format of
We use the avro-maven-plugin to convert the avsc file into POJOs.
Our application services (Kafka producers) then create POJOs put the POJOs onto various Kafka topics.
We're using serdes which look like this...
We use the build-in serialiser.
Our deserialiser is
The core of which is
clazz is the class of the POJO we're deserialising.
As far as I'm aware this is not using wire-format. This may be the problem.
This all works, and has been functioning for years in its own right.
We now want to use KSQL, as we can see a very large increase in the amount of data we need to process, and stream processing will scale better than running aggregations on static data.
We're running in Docker (Docker Desktop for dev purposes, backed by WSL2), and have used the recommended zookeeper / kafka / KSQLSB / KSQL CLI / schema-registry compose file. Just in case that makes a difference.
We've created a stream using...
This doesn't fail.
When we describe the stream, it has correctly identified all of the sub fields, and sub-sub fields.
As soon as we create a query to read from the stream, all of the data is instantly consumed, along with the following error...
The error we're seeing makes me think that we're not using wire-format.
If we try to manually specify the shape of the data using a STRUCT, we get the same error.
We have a fundamental problem with using the schema-registry, as when we set it up, it doesn't seem to "support" nested schemas.
I can't find a way of manually creating a schema in a single JSON object which is accepted by schema-registry. We can create a file which the avro-maven-plugin will use, but schema-registry doesn't like it.
If we un-nest the schema, and specify the sub-sub fields, sub-fields and fields (as it looks in the avsc example above), when we add the file to schema-registry it accepts it, but when we create a stream, it adds all sub-sub-fields, then sub-fields, then the actual object last of all. It doesn't feel like that is correct.
We have so many questions!
If anyone can help us out here, we'd be extremely grateful.
Thanks.
The text was updated successfully, but these errors were encountered: