Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to verify if the value schema for topic: is compatible with ksqlDB #10216

Open
aviadr1 opened this issue Feb 4, 2024 · 1 comment
Open

Comments

@aviadr1
Copy link

aviadr1 commented Feb 4, 2024

when I try to create a stream using a complicated JSON_SR schema generated from pydantic classes, KSQL fails "Unable to verify if the value schema for topic: is compatible with ksqlDB"

SCHEMA:

{
  "$defs": {
    "ChatMessage": {
      "additionalProperties": false,
      "description": "Represents a chat message.",
      "properties": {
        "message_id": {
          "description": "Unique identifier for the message",
          "title": "Message Id",
          "type": "string"
        },
        "channel_id": {
          "description": "Unique identifier for the channel",
          "title": "Channel Id",
          "type": "string"
        },
        "user_id": {
          "description": "Unique user id of the sender",
          "title": "User Id",
          "type": "string"
        },
        "text": {
          "description": "Content of the chat message",
          "title": "Text",
          "type": "string"
        },
        "timestamp": {
          "description": "Timestamp when the message was sent",
          "format": "date-time",
          "title": "Timestamp",
          "type": "string"
        }
      },
      "required": [
        "message_id",
        "channel_id",
        "user_id",
        "text",
        "timestamp"
      ],
      "title": "ChatMessage",
      "type": "object"
    },
    "ChatMessageAnalysis": {
      "additionalProperties": false,
      "description": "Represents analyzed metadata for a Twitch chat message.",
      "properties": {
        "message_id": {
          "description": "Unique identifier of the analyzed chat message",
          "title": "Message Id",
          "type": "string"
        },
        "channel_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Unique identifier for the channel",
          "title": "Channel Id"
        },
        "spam": {
          "default": true,
          "description": "Indicates if the message is considered spam",
          "title": "Spam",
          "type": "boolean"
        },
        "sentiment": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "choices": [
            "positive",
            "negative"
          ],
          "default": null,
          "description": "Sentiment of the message",
          "title": "Sentiment"
        },
        "intent": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "choices": [
            "humor",
            "compliment",
            "question",
            "speculation",
            "statement",
            "criticism",
            "suggestion",
            "agreement"
          ],
          "default": null,
          "description": "Classified intent of the message",
          "title": "Intent"
        },
        "entities": {
          "description": "List of named entity recognition (NER) entities discovered in the chat message",
          "items": {
            "$ref": "#/$defs/Entity"
          },
          "title": "Entities",
          "type": "array"
        },
        "topic_name": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Identifier for the topic in this channel",
          "title": "Topic Name"
        }
      },
      "required": [
        "message_id"
      ],
      "title": "ChatMessageAnalysis",
      "type": "object"
    },
    "ChatMessageExcerpt": {
      "additionalProperties": false,
      "description": "Represents an excerpt from a chat message related to a topic.",
      "properties": {
        "message_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The id of the message the excerpt came from",
          "title": "Message Id"
        },
        "user_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The id of the user who wrote the message",
          "title": "User Id"
        },
        "excerpt": {
          "description": "Abridged version of the chat message",
          "title": "Excerpt",
          "type": "string"
        }
      },
      "required": [
        "excerpt"
      ],
      "title": "ChatMessageExcerpt",
      "type": "object"
    },
    "ChatTopicSnapshot": {
      "additionalProperties": false,
      "description": "Represents the state of a topic at a specific point in a topics group.\n\nThis includes the prevalence of the topic in the current batch of messages, excerpts\nfrom messages that represent the topic, and a summary. Each state is linked to a\nChatTopic by the topic_id.",
      "properties": {
        "channel_id": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Unique identifier for the channel in which the topic is discussed",
          "title": "Channel Id"
        },
        "topic_name": {
          "description": "Name of the topic",
          "title": "Topic Name",
          "type": "string"
        },
        "prevalence": {
          "anyOf": [
            {
              "maximum": 1.0,
              "minimum": 0.0,
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Measure of how prevalent the topic is in the current batch of messages",
          "title": "Prevalence"
        },
        "excerpts": {
          "description": "List of excerpts from chat messages that are representative of the topic in this state",
          "items": {
            "$ref": "#/$defs/ChatMessageExcerpt"
          },
          "title": "Excerpts",
          "type": "array"
        },
        "summary": {
          "description": "Summary of the topic's state in the current batch",
          "title": "Summary",
          "type": "string"
        },
        "keywords": {
          "default": [],
          "description": "Keywords highly relevant for the topic",
          "items": {
            "type": "string"
          },
          "title": "Keywords",
          "type": "array"
        },
        "entities": {
          "default": [],
          "description": "Entities highly relevant for the topic",
          "items": {
            "type": "string"
          },
          "title": "Entities",
          "type": "array"
        }
      },
      "required": [
        "topic_name",
        "summary"
      ],
      "title": "ChatTopicSnapshot",
      "type": "object"
    },
    "Entity": {
      "additionalProperties": false,
      "description": "Represents an entity identified within a chat message.",
      "properties": {
        "entity": {
          "title": "Entity",
          "type": "string"
        },
        "entity_type": {
          "title": "Entity Type",
          "type": "string"
        }
      },
      "required": [
        "entity",
        "entity_type"
      ],
      "title": "Entity",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Represents a batch of chat analyses.\n\nThis includes the original chat messages, the overall topic analysis,\nand individual message analyses. Each item in `message_analysis` corresponds to an item in `messages` by index.",
  "properties": {
    "channel_id": {
      "default": null,
      "description": "Unique identifier for the channel in which the analysis happened",
      "title": "Channel Id",
      "type": "string"
    },
    "overall_analysis": {
      "additionalProperties": {
        "$ref": "#/$defs/ChatTopicSnapshot"
      },
      "description": "Ordered dictionary of overall topic analysis, keyed by topic name.",
      "title": "Overall Analysis",
      "type": "object"
    },
    "message_analysis": {
      "description": "List of analyses for individual messages.",
      "items": {
        "$ref": "#/$defs/ChatMessageAnalysis"
      },
      "title": "Message Analysis",
      "type": "array"
    },
    "messages": {
      "anyOf": [
        {
          "items": {
            "$ref": "#/$defs/ChatMessage"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "List of original chat messages.",
      "title": "Messages"
    },
    "llm_processing_start": {
      "anyOf": [
        {
          "format": "date-time",
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Timestamp when messages were sent to LLM for processing",
      "title": "Llm Processing Start"
    },
    "llm_processing_end": {
      "anyOf": [
        {
          "format": "date-time",
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Timestamp when results returned from LLM processing",
      "title": "Llm Processing End"
    },
    "fully_processed_timestamp": {
      "anyOf": [
        {
          "format": "date-time",
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Timestamp when the batch was fully processed",
      "title": "Fully Processed Timestamp"
    }
  },
  "required": [
    "overall_analysis",
    "message_analysis"
  ],
  "title": "ChatAnalysisBatch",
  "type": "object"
}

SQL:

       CREATE STREAM ai_chat_messages_analyzed_batches_stream WITH (
                           KAFKA_TOPIC='ai.chat_messages.analyzed_batches',
                           VALUE_FORMAT='JSON_SR',
                           PARTITIONS=10,
                           REPLICAS=1
                       );

HTTP Error occurred: 400 - {"@type":"statement_error","error_code":40001,"message":"Unable to verify if the value schema for topic: ai.chat_messages.analyzed_batches is compatible with ksqlDB.\nReason: Invalid null value for required STRING field\n\nPlease see https://github.com/confluentinc/ksql/issues/ to see if this particular reason is already known.\nIf not, please log a new issue, including this full error message.\nSchema:{\"$defs\":{\"ChatMessage\":{\"additionalProperties\":false,\"description\":\"Represents a chat message.\",\"properties\":{\"message_id\":{\"description\":\"Unique identifier for the message\",\"title\":\"Message Id\",\"type\":\"string\"},\"channel_id\":{\"description\":\"Unique identifier for the channel\",\"title\":\"Channel Id\",\"type\":\"string\"},\"user_id\":{\"description\":\"Unique user id of the sender\",\"title\":\"User Id\",\"type\":\"string\"},\"text\":{\"description\":\"Content of the chat message\",\"title\":\"Text\",\"type\":\"string\"},\"timestamp\":{\"description\":\"Timestamp when the message was sent\",\"format\":\"date-time\",\"title\":\"Timestamp\",\"type\":\"string\"}},\"required\":[\"message_id\",\"channel_id\",\"user_id\",\"text\",\"timestamp\"],\"title\":\"ChatMessage\",\"type\":\"object\"},\"ChatMessageAnalysis\":{\"additionalProperties\":false,\"description\":\"Represents analyzed metadata for a chat message.\",\"properties\":{\"message_id\":{\"description\":\"Unique identifier of the analyzed chat message\",\"title\":\"Message Id\",\"type\":\"string\"},\"channel_id\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Unique identifier for the channel\",\"title\":\"Channel Id\"},\"spam\":{\"default\":true,\"description\":\"Indicates if the message is considered spam\",\"title\":\"Spam\",\"type\":\"boolean\"},\"sentiment\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"choices\":[\"positive\",\"negative\"],\"default\":null,\"description\":\"Sentiment of the message\",\"title\":\"Sentiment\"},\"intent\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"choices\":[\"humor\",\"compliment\",\"question\",\"speculation\",\"statement\",\"criticism\",\"suggestion\",\"agreement\"],\"default\":null,\"description\":\"Classified intent of the message\",\"title\":\"Intent\"},\"entities\":{\"description\":\"List of named entity recognition (NER) entities discovered in the chat message\",\"items\":{\"$ref\":\"#/$defs/Entity\"},\"title\":\"Entities\",\"type\":\"array\"},\"topic_name\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Identifier for the topic in this channel\",\"title\":\"Topic Name\"}},\"required\":[\"message_id\"],\"title\":\"ChatMessageAnalysis\",\"type\":\"object\"},\"ChatMessageExcerpt\":{\"additionalProperties\":false,\"description\":\"Represents an excerpt from a chat message related to a topic.\",\"properties\":{\"message_id\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"The id of the message the excerpt came from\",\"title\":\"Message Id\"},\"user_id\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"The id of the user who wrote the message\",\"title\":\"User Id\"},\"excerpt\":{\"description\":\"Abridged version of the chat message\",\"title\":\"Excerpt\",\"type\":\"string\"}},\"required\":[\"excerpt\"],\"title\":\"ChatMessageExcerpt\",\"type\":\"object\"},\"ChatTopicSnapshot\":{\"additionalProperties\":false,\"description\":\"Represents the state of a topic at a specific point in a topics group.\\n\\nThis includes the prevalence of the topic in the current batch of messages, excerpts\\nfrom messages that represent the topic, and a summary. Each state is linked to a\\nChatTopic by the topic_id.\",\"properties\":{\"channel_id\":{\"anyOf\":[{\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Unique identifier for the channel in which the topic is discussed\",\"title\":\"Channel Id\"},\"topic_name\":{\"description\":\"Name of the topic\",\"title\":\"Topic Name\",\"type\":\"string\"},\"prevalence\":{\"anyOf\":[{\"maximum\":1.0,\"minimum\":0.0,\"type\":\"number\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Measure of how prevalent the topic is in the current batch of messages\",\"title\":\"Prevalence\"},\"excerpts\":{\"description\":\"List of excerpts from chat messages that are representative of the topic in this state\",\"items\":{\"$ref\":\"#/$defs/ChatMessageExcerpt\"},\"title\":\"Excerpts\",\"type\":\"array\"},\"summary\":{\"description\":\"Summary of the topic's state in the current batch\",\"title\":\"Summary\",\"type\":\"string\"},\"keywords\":{\"default\":[],\"description\":\"Keywords highly relevant for the topic\",\"items\":{\"type\":\"string\"},\"title\":\"Keywords\",\"type\":\"array\"},\"entities\":{\"default\":[],\"description\":\"Entities highly relevant for the topic\",\"items\":{\"type\":\"string\"},\"title\":\"Entities\",\"type\":\"array\"}},\"required\":[\"topic_name\",\"summary\"],\"title\":\"ChatTopicSnapshot\",\"type\":\"object\"},\"Entity\":{\"additionalProperties\":false,\"description\":\"Represents an entity identified within a chat message.\",\"properties\":{\"entity\":{\"title\":\"Entity\",\"type\":\"string\"},\"entity_type\":{\"title\":\"Entity Type\",\"type\":\"string\"}},\"required\":[\"entity\",\"entity_type\"],\"title\":\"Entity\",\"type\":\"object\"}},\"additionalProperties\":false,\"description\":\"Represents a batch of chat analyses.\\n\\nThis includes the original chat messages, the overall topic analysis,\\nand individual message analyses. Each item in message_analysiscorresponds to an item inmessages by index.\",\"properties\":{\"channel_id\":{\"default\":null,\"description\":\"Unique identifier for the channel in which the analysis happened\",\"title\":\"Channel Id\",\"type\":\"string\"},\"overall_analysis\":{\"additionalProperties\":{\"$ref\":\"#/$defs/ChatTopicSnapshot\"},\"description\":\"Ordered dictionary of overall topic analysis, keyed by topic name.\",\"title\":\"Overall Analysis\",\"type\":\"object\"},\"message_analysis\":{\"description\":\"List of analyses for individual messages.\",\"items\":{\"$ref\":\"#/$defs/ChatMessageAnalysis\"},\"title\":\"Message Analysis\",\"type\":\"array\"},\"messages\":{\"anyOf\":[{\"items\":{\"$ref\":\"#/$defs/ChatMessage\"},\"type\":\"array\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"List of original chat messages.\",\"title\":\"Messages\"},\"llm_processing_start\":{\"anyOf\":[{\"format\":\"date-time\",\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Timestamp when messages were sent to LLM for processing\",\"title\":\"Llm Processing Start\"},\"llm_processing_end\":{\"anyOf\":[{\"format\":\"date-time\",\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Timestamp when results returned from LLM processing\",\"title\":\"Llm Processing End\"},\"fully_processed_timestamp\":{\"anyOf\":[{\"format\":\"date-time\",\"type\":\"string\"},{\"type\":\"null\"}],\"default\":null,\"description\":\"Timestamp when the batch was fully processed\",\"title\":\"Fully Processed Timestamp\"}},\"required\":[\"overall_analysis\",\"message_analysis\"],\"title\":\"ChatAnalysisBatch\",\"type\":\"object\"}","statementText":"CREATE STREAM AI_CHAT_MESSAGES_ANALYZED_BATCHES_STREAM WITH (KAFKA_TOPIC='ai.chat_messages.analyzed_batches', KEY_FORMAT='KAFKA', PARTITIONS=10, REPLICAS=1, VALUE_FORMAT='JSON_SR');","entities":[]}

@aviadr1
Copy link
Author

aviadr1 commented Feb 4, 2024

would be good to have better prints that tell speficially which field is problematic.

the problem can be reproduced with a minimal pydantic class like this:

class ChatAnalysisBatch(BaseModel):
    channel_id: str = Field(
        None,
        description="Unique identifier for the channel in which the analysis happened",
    )

and it can be fixed like this:

class ChatAnalysisBatch(BaseModel):
    channel_id: str | None = Field(
        None,
        description="Unique identifier for the channel in which the analysis happened",
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant