Proposal for MIME type encoding #513

alkasm · 2022-08-04T05:31:07Z

Public-Facing Changes

This PR proposes a new well-known encoding to the spec for MIME types.

Description

Channels that pass data without a schema may contain content with a known MIME type. This PR proposes mime as a supported channel message encoding, with the schema encoding name referencing the MIME type/subtype directly.

There may be other ways to achieve this that are preferred, but this seemed like a good way to start a conversation about it.

CLAassistant · 2022-08-04T05:31:11Z

All committers have signed the CLA.

jhurliman · 2022-08-04T05:41:14Z

I think the “binary data” reference is confusing this a bit. Whether data is binary or utf8 or ascii doesn’t change the discussion.

That nit aside, the way I’ve been thinking about self-describing schemaless data such as h264 video is there would be no schema at all, and the channel encoding would be the IANA-registered MIME type.

amacneil · 2022-08-04T05:48:02Z

I guess in my mind the message encoding is the mime type, rather than the message encoding being mime. The message encoding should tell you what format the binary data in that message is.

I think we originally tried using actual mime types for the existing standardized message encodings, I can't remember why that idea got dropped (probably because there is no mime type for "ros 1 message"). But it seems reasonable to add some other well known ones for h264, jpg, etc.

As @jhurliman mentioned, there is no need for a schema for those formats (h264, jpg) because they are already self describing.

alkasm · 2022-08-04T05:51:32Z

Both of your comments helped me understand the difference between the channel message encoding and schema encoding better; thanks for that! It makes sense that the channel message encoding could be sufficient here. I guess I tried to use the schema to help disambiguate between the message encodings which aren't mime types and those that are---but that might not be necessary.

amacneil · 2022-08-04T05:58:04Z

Yeah - it's probably not clear from the current set of recommended channel/schema encodings, but schemas are intended to be optional. For self-describing messages like json or jpeg there is no need for a schema.

The reason that channel and schema encodings are specified separately is that they don't always match 1:1. For example:

json messages might use either https://json-schema.org/ or https://typeschema.org/ to define their schema.
cdr messages (ROS 2) might use either ros2msg or ros2idl to define their schema

Those combinations aren't supported in Foxglove Studio today, but we wanted the flexibility.

defunctzombie · 2022-08-04T17:05:03Z

Some thoughts:

The purpose of the message_encoding field is to tell you what binary serialization format the message data is in - specially with the goal of "how to deserialize it". Sometimes message_encoding is still not enough (for example protobuf), and you need an additional schema. In mcap files, the pairing of schema+message_encoding should be sufficient to deserialize the message data now and forever.

We could consider mime as one of those situations if we use mime for message_encoding - but then what type of Schema record do you pair it with? We could say that the schema_encoding would be mime or text/plain and the _schema_ is image/jpeg. Tho what would be the name? That's one approach we could take.

The other is to leverage the schema-less feature of channel records and use a schema id of 0. Then the message_encoding would need to be something like image/jpeg or mime:image/jpeg whatever we decide.

alkasm · 2022-08-07T04:44:42Z

@defunctzombie yeah, your second paragraph was my reasoning for the PR as it was initially proposed. Since "mime" isn't an encoding I agree with the above consensus that a schema isn't necessary; the mime type itself should be sufficient as the message encoding. I do like the mime: prefix. I guess "media types" is the currently preferred nomenclature though: https://www.iana.org/assignments/media-types/media-types.xhtml

Media Types (formerly known as MIME types)

so perhaps media: as a prefix? All mime types are of the form type/subtype; there's always a slash. Potentially that is enough?

amacneil · 2022-08-07T21:28:55Z

I'm in favor of just dropping the mime: / media: prefix, and specifying in the spec that implementations should interpret any unknown message encoding as a media type.

I think the only reason we didn't just use media types to specify message encoding is that there are none registered for the initial encodings we wanted to support (ros 1, ros 2 cdr, protobuf, flatbuffer) - only json is registered. But using them going forward seems sensible.

jtbandes · 2022-08-08T16:28:36Z

implementations should interpret any unknown message encoding as a media type

This seems like a strange fallback. What if next year we add support for a ros3 message encoding? Old tools will treat that as "unknown media type"?

amacneil · 2022-08-08T16:31:12Z

They would treat it like an unknown type, same as they do today.

If you want to be more specific, we could say that any message encoding containing a / is assumed to be a media type, and others should come from our shorthand list.

But also, if we add ros3 next year maybe we should just use media type syntax going forward and use something like application/x-ros3-msg?

jtbandes · 2022-08-08T17:17:39Z

They would treat it like an unknown type, same as they do today.

I guess in a world where we use the media type syntax for all types, that would make sense.

Would we use media types for both message encoding and schema encoding (when a schema is used)?

maybe we should just use media type syntax going forward and use something like application/x-ros3-msg?

I thought x- wasn't a thing anymore 😅 https://www.rfc-editor.org/rfc/rfc6648.html

amacneil · 2022-08-08T23:07:24Z

I thought x- wasn't a thing anymore 😅 https://www.rfc-editor.org/rfc/rfc6648.html

It seems like their answer is x- is no longer necessary because we made the registration process easier. So if we are to copy that model, we should similarly recommend against using non-standard encodings, and instead encourage users to register any custom type they are using in our appendix (possibly with a vnd. prefix if it is company-specific).

Would we use media types for both message encoding and schema encoding (when a schema is used)?

I mean in theory what we are trying to do is already solved by media types. In practice, there is no registered media type for "protobuf filedescriptorset" or "jsonschema" or "concatenated ros1 msg files".

So it seems like we either need to go and register a bunch of media types, or we need to have some "override" shorthand values that are not registered media types.

Turning this into a concrete proposal, we could say something like:

The message_encoding and schema_encoding must be interpreted as either (a) if it does not contain a forwardslash, a well known encoding registered in our spec appendix, or (b) if it contains a forwardslash, a well known media type.
Implementers are free to put non-standard data in the message or schema encoding fields, but are strongly encouraged to register their string in one of these two databases.

Thoughts?

alkasm · 2022-08-10T06:06:13Z

Some prior art from gRPC, a typical call uses application/grpc+proto (not registered with IANA).

from https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests:

Content-Type → "content-type" "application/grpc" [("+proto" / "+json" / {custom})]

jhurliman · 2022-09-29T16:11:26Z

Closing now that #563 has landed

defunctzombie · 2022-09-29T16:15:13Z

@jhurliman is #563 separate from the ask here? #563 is a rename of our existing language. This PR wanted to explore expanding the spec to allow media_type as the message encoding for channels.

defunctzombie · 2022-09-29T16:17:15Z

@alkasm as your use of mcap has evolved do you still think this issue is worth exploring?

alkasm · 2022-10-04T05:12:15Z

@defunctzombie you're right that the attachment field name is not the same as this issue.

For now, we have standardized on accepting a mimetype as the message encoding, with no schema encoding, i.e. language similar to:

Channel message_encoding: MUST be one of protobuf, json, or <mimetype>
Schema encoding: MUST be one of protobuf or "" (empty string)

We're primarily using the mimetype for imagery or video data, e.g. video/h264 or image/jpeg, and also some raw data streams come through as application/octet-stream.

amacneil · 2022-10-05T06:34:09Z

Would it be worth adding to our spec appendix that well known media types are explicitly allowed in the message encoding field? E.g. image/jpeg seems like a no-brainer to me.

video/h264 I also think would be worth explicitly stating in the appendix how we expect it to be stored with respect to timestamps.

defunctzombie · 2022-10-05T17:22:33Z

video/h264 I also think would be worth explicitly stating in the appendix how we expect it to be stored with respect to timestamps.

Not just timestamps - but also which format (annexb or avcc) and how many NAL packets. I can't say with certainty since I've not done enough research on it but my quick read of the video/h264 media type does not lead me to think it is sufficient as the message_encoding value.

In my experiments making a web viewer for h264 data in an mcap file I used the following message encodings which I would assume we'd define in the mcap well-known spec. We could do the same for video/h264 but that might not align 100% with media type video/h264.

That aside - I do think there is value in being clear in the spec about media type use within message_encoding. It seems like a nice way to allow for storing images and other well-known formats as messages.

jhurliman · 2022-10-05T22:24:02Z

We would need to be careful with the wording, "explicitly allowed" is not quite right because we don't disallow strings that are not IANA registered media types. Maybe provide an example of using image/jpeg with schema_id=0 to convey that this is a good practice.

For video, I think we need to keep researching and provide a working proof of concept before adding to the spec. We need to answer whether video/h264 is sufficient, or if it should be video/h264; codecs="avc1.4d002a", or something even more specific (ex: messages contain NAL Access Units in Annex-B format where Decode Order equals Output Order, i.e. no B-frames).

defunctzombie · 2024-05-28T21:33:23Z

I'm gonna close this out. Thanks for the discussion but for now we won't be moving on this.

Propose MIME type encoding

d55ab1f

amacneil mentioned this pull request Sep 6, 2022

Rename attachment content_type field to media_type #563

Closed

jhurliman closed this Sep 29, 2022

defunctzombie reopened this Oct 4, 2022

defunctzombie closed this May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for MIME type encoding #513

Proposal for MIME type encoding #513

alkasm commented Aug 4, 2022 •

edited

CLAassistant commented Aug 4, 2022 •

edited

jhurliman commented Aug 4, 2022

amacneil commented Aug 4, 2022 •

edited

alkasm commented Aug 4, 2022 •

edited

amacneil commented Aug 4, 2022

defunctzombie commented Aug 4, 2022

alkasm commented Aug 7, 2022

amacneil commented Aug 7, 2022

jtbandes commented Aug 8, 2022

amacneil commented Aug 8, 2022

jtbandes commented Aug 8, 2022

amacneil commented Aug 8, 2022

alkasm commented Aug 10, 2022 •

edited

jhurliman commented Sep 29, 2022

defunctzombie commented Sep 29, 2022

defunctzombie commented Sep 29, 2022

alkasm commented Oct 4, 2022 •

edited

amacneil commented Oct 5, 2022

defunctzombie commented Oct 5, 2022

jhurliman commented Oct 5, 2022

defunctzombie commented May 28, 2024

Proposal for MIME type encoding #513

Proposal for MIME type encoding #513

Conversation

alkasm commented Aug 4, 2022 • edited

CLAassistant commented Aug 4, 2022 • edited

jhurliman commented Aug 4, 2022

amacneil commented Aug 4, 2022 • edited

alkasm commented Aug 4, 2022 • edited

amacneil commented Aug 4, 2022

defunctzombie commented Aug 4, 2022

alkasm commented Aug 7, 2022

amacneil commented Aug 7, 2022

jtbandes commented Aug 8, 2022

amacneil commented Aug 8, 2022

jtbandes commented Aug 8, 2022

amacneil commented Aug 8, 2022

alkasm commented Aug 10, 2022 • edited

jhurliman commented Sep 29, 2022

defunctzombie commented Sep 29, 2022

defunctzombie commented Sep 29, 2022

alkasm commented Oct 4, 2022 • edited

amacneil commented Oct 5, 2022

defunctzombie commented Oct 5, 2022

jhurliman commented Oct 5, 2022

defunctzombie commented May 28, 2024

alkasm commented Aug 4, 2022 •

edited

CLAassistant commented Aug 4, 2022 •

edited

amacneil commented Aug 4, 2022 •

edited

alkasm commented Aug 4, 2022 •

edited

alkasm commented Aug 10, 2022 •

edited

alkasm commented Oct 4, 2022 •

edited