Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a note about MIME types of font attachments #518

Open
mbunkus opened this issue Jun 18, 2021 · 16 comments
Open

add a note about MIME types of font attachments #518

mbunkus opened this issue Jun 18, 2021 · 16 comments
Assignees
Labels
enhancement spec_codecs Codec Matroska spec document target

Comments

@mbunkus
Copy link
Contributor

mbunkus commented Jun 18, 2021

As explained over on doom9's forum attached fonts in Matroska have long used a legacy MIME type as official MIME types for fonts haven't been available for a very long time. Liisachan requested that we add a note about which MIME types to use for new files & which MIME types players should support in order to be able to play older files as well.

Here's their suggestion for a starting point for that note:

Matroska was born more than 10 years before the font MIME types have been standardized. Initially the generic type application/octet-stream was used for font files, and then starting in 2004, application/x-truetype-font was used for TTF embedded for subtitle tracks (by a patch by Haali), and this private MIME type was the de facto standard for many years. For backward compatibility, a player that supports embedded fonts for subtitles SHOULD treat application/x-truetype-font as font/ttf, and application/vnd.ms-opentype as font/otf. Such a player also MAY check the extention stored in FileName, when it can't recognize the value in FileMimeType: case-insensitive .ttf, .otf, .ttc strongly suggest that the attached file is font/ttf, font/otf, font/collection, respectively.

This cannot be used as is, of course. I'll whip up a PR soon(ish).

@mbunkus mbunkus self-assigned this Jun 18, 2021
@Liisachan
Copy link

There are following 10 types:

(1) These mime types are officially registered, and MUST be supported:

  • font/ttf
  • font/otf
  • font/collection
  • font/sfnt [*1]
  • application/font-sfnt [*2]

(2) This is a valid generic type, and MAY be supported if the player checks the file extenstion.

  • application/octet-stream

(3) These two have been the de facto standard types, and SHOULD (or MUST) be supported:

  • application/x-truetype-font == font/ttf, sometimes font/otf [*3]
  • application/vnd.ms-opentype == font/otf

(4) These are rare types which might be sometimes used in the wild; MAY be supported:

  • application/x-font-ttf
  • application/x-font

*1 “Note that "font/sfnt" is an abstract type from which the (widely used in practice) "font/ttf" and "font/otf" types are conceptually derived. Use of "font/sfnt" is likely to be rare in practice, and might be confined to: Uncommon combinations such as "font/sfnt; layout=sil" that do not have a shorter type” [RFC8081]

*2 DEPRECATED in favor of font/sfnt [IANA] but still valid. -- “Contrary to the expectations of the W3C WebFonts WG, which developed Web Open Font Format (WOFF), the officially defined media types such as "application/font-woff" and "application/font-sfnt" see a very limited use” [RFC8081].

*3 An x- type, application/x-truetype-font, is technically valid [RFC2045]: “A media type value beginning with the characters "X-" is a private value, to be used by consenting systems by mutual agreement.” [RFC2046]. Haali started using it in his experimental patch to Gabest's code; an x- type exists exactly for such a situation. One should not misunderstand that application/x-truetype-font is something wrong, something non-standard. It was, and still is, a perfectly valid MIME type, although of course its usage is now discouraged, as the official type for TTF has been registered.

@mcr
Copy link
Contributor

mcr commented Jun 19, 2021

Can you provide some advice for encoders?

Are there any common players that do not support font/ttf and font/otf?
Given that font/sfnt is rare, it seems like it ought never be used.
What about application/font-sfnt?

Point (2) concerns me. Implementations can check the extension, but they SHOULD NOT derive the type by examining the file contents. This is what Windows Outlook still does (27 years later) and the result is endless trojans in email that evade scanning, as the generic open calls launches some word macro in a file that didn't presented itself as an image.

@Liisachan
Copy link

I'd say, encoders should keep using the legacy types for the time being. Using font/ttf, you may feel happy knowing that you're rigorously following the standard, and in the long run, eventually we should do things in the standardized way. But currently, there are still some Windows users using players that do not fully support font/ttf (anything before October 2019 - MPC-BE 1.5.4 and before; MPC-HC 1.8.8 and before; LAV Filters 0.74.1 and before). Linux/Mac users should be okay, though.

MPC-HC is widely used, but if a user downloads its "latest" version from the official site or from sourceforge.net, they'll get an older version. Because of this, styled subtitles may break for significantly many Windows users (maybe 10–20% of them?) when font/ttf is used.

Unfortunately, the current version of MKVToolnix (v58.0.0) does write font/sfnt by default for TTF. MKVToolnix also wrote application/font-sfnt at least on Mac in the past. So it's too late. It seems that libmagic is responsible for these unusual (although technically valid) MIME types.

Re: Point (2). By checking the file extension, some versions of LAV Filters & MPC-HC were able to load the font/ttf files correctly, even when that mime type was not explicitly supported by them. That was actually helpful for end users (except the implementation was slightly ad hoc, where .ttf was loaded but .TTF was not). But I do understand your concern. First off, loading a file just because it says font/ttf is already dangerous. One can create a malicious MKV file, where an abnormally huge font file is attached (or so it claims). A naive player might try to read beyond EOF, or at least something unpleasant may occur. So a note about security considerations may be a good idea. On the other hand, there are legal CJK fonts larger than 20 MB (e.g. Microsoft YaHei).

@mcr
Copy link
Contributor

mcr commented Jun 19, 2021

About point (2), the concern is not file extensions, that's just meta-data, equivalent to the MIME type.
The concern is that having looked at the file contents, that Word is loaded, rather than a font. That's the part that I want to make sure we recommend against

So font/sfnt is among the MUST, so I guess it has to remain acceptable to write it.
But, it seems like all of the rest of the points 2,3,4 are not recommended? ("MUST NOT") and they exist just for backwards compatiblity.

@Liisachan
Copy link

Saying that a muxer MUST NOT use an x- type anymore would be too harsh. It MAY use it for backward compatibility and/or interoperability, given that the standard explicitly guarantees that an x- type is freely usable as long as there is a mutual agreement (between the writing app. and player, in this case). Afaik all existing players recognize the legacy types, so I'd say there is a mutual agreement. Legacy types should be phased out eventually, but the change should be gradual so that no one will be upset.

On Windows, when a player "loads" a font, it doesn't start any application; it just calls a function like AddFontResourceEx, which simply fails if the data is not a valid font. The attached font can be installed privately, not visible from other processes. This is quite different from an attachment to email, where a random application may start automatically if you click an icon.

Let's say, hypothetically the player can handle TTF but can not handle TTC, and let's say the mime type is ambiguous or not reliable. So the player wants to know if it's TTF or TTC. One quick way is to check the extension and see if it's .ttc or not. Another way is to read the first 4 bytes of the attached data and see if it's 'ttcf' or not. These two are not so different. It's not like reading the first 4 bytes is intrinsically more dangerous. That said, you're right, an attached file should be treated carefully: an attacker may be able to create a malicious MKV to exploit a font-related security hole of a specific (poorly designed) player, though such an attack vector seems not very likely.

@robUx4 robUx4 added the spec_main Main Matroska spec document target label Jun 20, 2021
@robUx4
Copy link
Contributor

robUx4 commented Aug 24, 2021

I don't think the normative notion of MUST/SHOULD/MUST NOT/etc applies to fonts. It would be like saying every Matroska implementation MUST support h264 and mp3 codecs. It's up to each player to decide what they want to support. Fonts described here are even "extensions" of a subtitle codec. So if the player doesn't support these Subtitle codec there's no reason to force it to support any of these MIME types.

IMO it's up to each subtitle codec to define what font format they want the player to support. In other words, it should go in the codec document, not the "main" Matroska spec.

@robUx4 robUx4 added spec_codecs Codec Matroska spec document target and removed spec_main Main Matroska spec document target labels Aug 24, 2021
@Liisachan
Copy link

Liisachan commented Aug 24, 2021

You're right. As the first post says, “a player that supports embedded fonts for subtitles” should be careful about the backward compatibility, is all. It's NOT like every player should support embedded fonts. If players support fonts, then they are strongly recommended to support legacy MIME-types too. It's reasonable, isn't it?

The docs coming with the latest MKVToolnix still use legacy MIME-types too.

robUx4 added a commit that referenced this issue Sep 19, 2021
and what a writer should do

According to the findings from #518
@robUx4
Copy link
Contributor

robUx4 commented Sep 19, 2021

I updated #115 to include the MIME types a player can expect and what a writer should use (new MIME, unless playback with old players is important).

In the end we can't just rely on the codec spec to tell how font attachments have to be used. There are too many fine details to deal with and they are unrelated to the codec itself.

The use of font attachments remains entirely optional, but it means the subtitle rendering might be incorrect. (I'm not sure VLC supports them, although it does read them)

@Liisachan
Copy link

The use of font attachments remains entirely optional, but it means the subtitle rendering might be incorrect. (I'm not sure VLC supports them, although it does read them)

Exactly. It's important for a soft-subber to realize that font support is optional and one can't reliably control softsub rendering. It's like CSS + browser. Also, it's true that subtitles may not be even readable when the attached font is not loaded (e.g. when it's a minority language whose alphabet is not supported by OS). The only surefire way to avoid this is hardsubbing.

VLC supports embedded fonts almost perfectly. I didn't check the source code, but if I'm guessing right, although it doesn't explicitly support application/font-sfnt, it can still correctly guess it's a font if the extension is .ttf.

@robUx4
Copy link
Contributor

robUx4 commented Sep 20, 2021

I found the relevant code in VLC. The MIME type is only used to check whether the attachment is a font. application/octet-stream + file extension is not used.

It's up to freetype's FT_New_Memory_Face() code to detect the font type from the binary data.

@Liisachan
Copy link

Here in libass (in VLC)
https://code.videolan.org/videolan/vlc/-/blob/master/modules/codec/libass.c#L181

  1. Look for the mimetype application/x-truetype-font
  2. If failed, look for extensions .ttf .otf .ttc

@robUx4
Copy link
Contributor

robUx4 commented Sep 20, 2021

Yeah, it should be more consistant with the freetype one. Also the "extension to MIME type" conversion should be done in the matroska demuxer. The subtitle/text renderer should only have to deal with MIME types.

robUx4 added a commit that referenced this issue Oct 26, 2021
and what a writer should do

According to the findings from #518
@Liisachan
Copy link

Please check my comment on dc90789

robUx4 added a commit that referenced this issue Jan 30, 2022
and what a writer should do

According to the findings from #518
@robUx4
Copy link
Contributor

robUx4 commented Jan 30, 2022

Not sure why I tagged this as a codec thing... Anyway I think it's fixed by #115. We will need to mention in subtitle codecs the ones that make use of the fonts in the container.

@robUx4
Copy link
Contributor

robUx4 commented Aug 7, 2022

Not sure why I tagged this as a codec thing

Because now it's up to each codec that use fonts to so mention it in their codec definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement spec_codecs Codec Matroska spec document target
Projects
Codec specifications
Awaiting triage
Development

No branches or pull requests

5 participants
@mcr @mbunkus @robUx4 @Liisachan and others