-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for MIME type encoding #513
Conversation
I think the “binary data” reference is confusing this a bit. Whether data is binary or utf8 or ascii doesn’t change the discussion. That nit aside, the way I’ve been thinking about self-describing schemaless data such as h264 video is there would be no schema at all, and the channel encoding would be the IANA-registered MIME type. |
I guess in my mind the message encoding is the mime type, rather than the message encoding being I think we originally tried using actual mime types for the existing standardized message encodings, I can't remember why that idea got dropped (probably because there is no mime type for "ros 1 message"). But it seems reasonable to add some other well known ones for h264, jpg, etc. As @jhurliman mentioned, there is no need for a schema for those formats (h264, jpg) because they are already self describing. |
Both of your comments helped me understand the difference between the channel message encoding and schema encoding better; thanks for that! It makes sense that the channel message encoding could be sufficient here. I guess I tried to use the schema to help disambiguate between the message encodings which aren't mime types and those that are---but that might not be necessary. |
Yeah - it's probably not clear from the current set of recommended channel/schema encodings, but schemas are intended to be optional. For self-describing messages like json or jpeg there is no need for a schema. The reason that channel and schema encodings are specified separately is that they don't always match 1:1. For example:
Those combinations aren't supported in Foxglove Studio today, but we wanted the flexibility. |
Some thoughts: The purpose of the message_encoding field is to tell you what binary serialization format the message data is in - specially with the goal of "how to deserialize it". Sometimes We could consider The other is to leverage the schema-less feature of channel records and use a schema id of |
@defunctzombie yeah, your second paragraph was my reasoning for the PR as it was initially proposed. Since "mime" isn't an encoding I agree with the above consensus that a schema isn't necessary; the mime type itself should be sufficient as the message encoding. I do like the
so perhaps |
I'm in favor of just dropping the I think the only reason we didn't just use media types to specify message encoding is that there are none registered for the initial encodings we wanted to support (ros 1, ros 2 cdr, protobuf, flatbuffer) - only json is registered. But using them going forward seems sensible. |
This seems like a strange fallback. What if next year we add support for a |
They would treat it like an unknown type, same as they do today. If you want to be more specific, we could say that any message encoding containing a But also, if we add |
I guess in a world where we use the media type syntax for all types, that would make sense. Would we use media types for both message encoding and schema encoding (when a schema is used)?
I thought |
It seems like their answer is
I mean in theory what we are trying to do is already solved by media types. In practice, there is no registered media type for "protobuf filedescriptorset" or "jsonschema" or "concatenated ros1 msg files". So it seems like we either need to go and register a bunch of media types, or we need to have some "override" shorthand values that are not registered media types. Turning this into a concrete proposal, we could say something like:
Thoughts? |
Some prior art from gRPC, a typical call uses from https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests:
|
Closing now that #563 has landed |
@jhurliman is #563 separate from the ask here? #563 is a rename of our existing language. This PR wanted to explore expanding the spec to allow media_type as the message encoding for channels. |
@alkasm as your use of mcap has evolved do you still think this issue is worth exploring? |
@defunctzombie you're right that the attachment field name is not the same as this issue. For now, we have standardized on accepting a mimetype as the message encoding, with no schema encoding, i.e. language similar to:
We're primarily using the mimetype for imagery or video data, e.g. video/h264 or image/jpeg, and also some raw data streams come through as application/octet-stream. |
Would it be worth adding to our spec appendix that well known media types are explicitly allowed in the message encoding field? E.g.
|
Not just timestamps - but also which format (annexb or avcc) and how many NAL packets. I can't say with certainty since I've not done enough research on it but my quick read of the In my experiments making a web viewer for h264 data in an mcap file I used the following message encodings which I would assume we'd define in the mcap well-known spec. We could do the same for That aside - I do think there is value in being clear in the spec about media type use within message_encoding. It seems like a nice way to allow for storing images and other well-known formats as messages. |
We would need to be careful with the wording, "explicitly allowed" is not quite right because we don't disallow strings that are not IANA registered media types. Maybe provide an example of using For video, I think we need to keep researching and provide a working proof of concept before adding to the spec. We need to answer whether |
I'm gonna close this out. Thanks for the discussion but for now we won't be moving on this. |
Public-Facing Changes
This PR proposes a new well-known encoding to the spec for MIME types.
Description
Channels that pass data without a schema may contain content with a known MIME type. This PR proposes
mime
as a supported channel message encoding, with the schema encoding name referencing the MIME type/subtype directly.There may be other ways to achieve this that are preferred, but this seemed like a good way to start a conversation about it.