Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Closed Captions to .MKV container #375

Open
sbshepherd opened this issue Apr 29, 2020 · 20 comments
Open

Add Support for Closed Captions to .MKV container #375

sbshepherd opened this issue Apr 29, 2020 · 20 comments
Labels
codec mapping spec_codecs Codec Matroska spec document target

Comments

@sbshepherd
Copy link

sbshepherd commented Apr 29, 2020

I would like to transcode existing .MXF video files to FFV1/.MKV for long-term preservation but the closed caption streams get stripped out because the .MKV container can’t contain them.

The codecs of my .MXF files differ, but the example I’ll use here is DNxHD. It contains six closed caption streams (different languages). Three are EIA-608 and three are EIA-708. These show in MediaInfo as “Text” streams, and they show in FFprobe as a single data stream with a data_type of “vbi_vanc_smpte_436M.” See screenshots below:

image
image

If it’s helpful, here is the video codec information for this file:
DNxHD_MXF

Ideally, these caption streams will carry over into the new .MKV file and be playable in a standard media player such as VLC. I should be able to turn on/off each language as the video plays.

@JeromeMartinez
Copy link
Contributor

and they show in FFprobe as a single data stream with a data_type of “vbi_vanc_smpte_436M.”

FYI "vbi_vanc_smpte_436M" in FFmpeg is called "Ancillary Data" in MediaInfo (608/708 captions are muxed in the "vbi_vanc_smpte_436M" which is muxed in the MXF).

It contains six closed caption streams

For reference: 1 closed caption stream, format is CDP.
We could extract 608 from CDP, but 708 can not be alone (it needs CDP).
Here, I think we should convert CDP to "DTVCC Transport" (transport layer of 708 spec, and same features as CDP). this stream would transport 608 and 708 streams as in ATSC streams ("DTVCC Transport" is the content in AVC or HEVC private element dedicated to captions).

Several steps here:

  • defining 708 transport layer in a MKV track (extension document)
  • implementing it in an encoder
  • implementing it in VLC

@sbshepherd
Copy link
Author

There is a sample file at the following location if one is needed for testing: https://archive.org/details/xdcam_sample_with_caption_track

@robUx4
Copy link
Contributor

robUx4 commented May 24, 2020

From the point of view of Matroska, do we need another track type for Closed Caption (as the title suggests) ? Or it can be put in subtitle tracks with the proper codec mapping ?

@mbunkus
Copy link
Contributor

mbunkus commented May 24, 2020

Personally I'd vote for "keep it subtitle, let CodecID speak for itself". We already integrate to many different formats under the type "subtitle", some text based, others are images. And closed captions fulfill largely the same role.

I don't consider their traditional way of transportation (embedded in the video track) to be relevant for our decision.

@JeromeMartinez
Copy link
Contributor

I don't consider their traditional way of transportation (embedded in the video track) to be relevant for our decision.

The debate open/closed caption vs subtitle is not based on their traditional way of transportation, it is about the nature of the content, see for example the description of both in HTML or a long explanation about the "difference".

but IMO we could keep "S_" prefix as in practice there is so little difference.

@mbunkus
Copy link
Contributor

mbunkus commented May 26, 2020

Well, what our track type "subtitles" transports can easily fill both roles, and it only depends on the content. Similar to how "audio" tracks can contain the whole dialog or only the director's comments. I don't see any reason to use a separate track type.

@JeromeMartinez
Copy link
Contributor

I don't see any reason to use a separate track type.

No worry :), we are in sync here! (keeping "S_" prefix)

@mbunkus
Copy link
Contributor

mbunkus commented May 26, 2020

It would definitely be good to have a type indicator orthogonal to the track type. We've talked about such a track header field several times already.

@dericed
Copy link
Contributor

dericed commented May 27, 2020

Is it worthwhile to support these as block additional mappings? To store the raw bytes along their corresponding frame.

@mbunkus
Copy link
Contributor

mbunkus commented May 27, 2020

What would the advantages be?

I'm pretty much against that. CCs are something you can turn on or off, they're something that has meta data such as e.g. a track name and a language, they're basically something that acts like a track. So let's make it a (separate) track, not be somehow part of another track.

@MikeChenMM
Copy link

Hi, all, another instance of "I did it unofficially again" here.
MakeMKV transcodes closed captions from DVD to UTF-8 SRT. To do so, internally, it first extracts 608 data into a separate track "S_CC608/DVD" and then internally converts this track into "S_TEXT/UTF8". By default, "S_CC608/DVD" track is never written to actual MKV file. In theory one can change conversion profile to get raw S_CC608/DVD stream instead of converted copy.
I personally see very little benefit in supporting raw 608 streams - they are overwhelmingly generated from text subtitles, can be automatically converted to text subtitles and are essentially text subtitles. The code to convert 608/708 to text is GPL. Why bother?...

@sbshepherd
Copy link
Author

This might be a silly question, but I wonder how the various options will affect potential broadcast of the content. If the broadcasting arm of our organization wants to use the .MKV with closed captions, will that be do-able under any of these scenarios? I understand the similarities between closed captions and subtitles are minimal in practice, but it occurs to me that the use cases for these files may not be solely web-based. They should be preservation worthy, meaning I shouldn't lose functionality that was already in the original file. I can hear someone arguing "then why not keep it .MXF?" Answer: because .MXF is the problem we're trying to solve.

I don't know a lot about broadcasting, so maybe the idea of broadcasting an .MKV won't work regardless. If that's the case, then the file (with captions) would need to be capable of transcoding to a format that can maintain the captions for broadcast.

Am I asking too much? :)

@robUx4 robUx4 added the spec_codecs Codec Matroska spec document target label Mar 14, 2021
@robUx4
Copy link
Contributor

robUx4 commented May 22, 2022

We have the new track flags since #447, namely FlagHearingImpaired, FlagVisualImpaired and FlagTextDescriptions.

Also the subtitle track is defined as

Subtitle or closed caption data to be rendered over the video track(s).

So apart from the actual 608 and 708 codec definitions, do we need anything else ?

@dhouck
Copy link

dhouck commented Mar 31, 2023

I personally see very little benefit in supporting raw 608 streams - they are overwhelmingly generated from text subtitles, can be automatically converted to text subtitles and are essentially text subtitles. The code to convert 608/708 to text is GPL. Why bother?...

Which code are you talking about? Iʼve seen multiple programs that convert closed captions to other subtitle formats, but they usually lose relevant information in at least some cases. Since people want to use Matroska for archival, there should be the option of storing the original data instead of lossily converting it.

One potential difficulty I see is that one closed captioning stream can have multiple logical tracks (for example, one language in CC1 and another in CC2), and Iʼm not sure if thereʼs any good way to handle that.

@JeromeMartinez
Copy link
Contributor

but they usually lose relevant information in at least some cases.

I add some example to this conversion about lack losslessness: metadata is lost (program name, content advisory, network name, weather info, etc), exact timing of events may be lost too especially in RollUp mode, positioning/colors may also be lost (most converter don't care about that and all is in a .srt without positioning), and reversibility to CEA-608/708 is a lot more difficult (and we have to code it, AFAIK there is no such code yet).

@dericed
Copy link
Contributor

dericed commented Mar 31, 2023

I reread this discussion and IIUC there seems to be rough consensus in:

  • storing 608/708 caption data as a subtitle track rather than side data on the video frames
  • storing 608/708 as-is rather than transforming to a more common subtitle format (such conversions are often lossy)

I agree with @sbshepherd's concern that a 608/mxf to 608/mkv conversion may have some loss in functionality, particularly in broadcast settings, but this is a chicken/egg issue. No one can add support for 608/mkv broadcast functionality until we have the specification written and ideally some sample files freely published.

In planning to store 608/708 data, I suggest also considering muxing in scc files as an input.

I'm curious to know more about @MikeChenMM's internally-defined S_CC608/DVD.

Perhaps we could document a few scenarios:
S_CC608 where the block stores the two octets of caption data. This is similar to how 608 captions are written into the VAUX header of DV or the c608 track of QuickTime.
S_CCS436M which stores the SMPTE 436M values.

Though in each of these scearnios would we need a CodecPrivate definition?

@JeromeMartinez
Copy link
Contributor

Perhaps we could document a few scenarios

Also S_CC708 for extracting c708 from Ancillary data (or MOV).

Though in each of these scearnios would we need a CodecPrivate definition?

They are both "streaming" formats so don't require any configuration data.

@MikeChenMM
Copy link

MikeChenMM commented Mar 31, 2023 via email

@dhouck
Copy link

dhouck commented Apr 4, 2023

Iʼd like to expand on the worry I mentioned before. Consider the attached SCC file (zipped so it can be uploaded to GitHub), which I made as an example for this YouTube video1, which is a song with Italian lyrics but an official English translation. The one SCC file has data for both, although most software Iʼve used can only access the Italian track. (This technique of multiple languages in different channels is common in certain situations, although I created the specific file myself as an example for this bug report).

Ideally, this one stream of byte pairs would decompose into two subtitle tracks (one Italian, the other English); I donʼt know if Matroska currently supports tracks sharing data like that but if not some other solution would need to be found, and I imagine requiring the muxer to figure out which track each byte pair relates to is not the best answer.

I donʼt know much about 708, but I think it exacerbates this issue by being able to carry more data in whatʼs still logically the same stream.

SognoDiVolareStayAtHomeChoir.zip

Footnotes

  1. Note that the video, and the SCC file that goes with it, are 25 FPS; some tools assume 29.97 FPS for SCC files but given the relationship between the caption data and the video frames it makes sense to match the frame rate.

@robUx4
Copy link
Contributor

robUx4 commented Oct 8, 2023

I donʼt know if Matroska currently supports tracks sharing data like that

No. And IMO it's not a feature we would want. If you remux the file and only want to keep the English version, you would need to know which part needs to go with what (or it could be hidden from the user). It would also be tricky to implement for players. They may have closed caption support and splitting the tracks accordingly. But when stored in Matroska each language should have its own track.

That means to mux properly these 2 tracks you need to be able to parse the 708 data to generate two tracks that each keep one language. That probably means having a 708 "encoder" as well, if you have to use some 708 features. This is tricky, but not trickier than having to handle tracks which content depend on another track in all players.

I think this use case is specific to 708 (and 608?), a modern format would not mix languages like that in a binary format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codec mapping spec_codecs Codec Matroska spec document target
Projects
Codec specifications
Awaiting triage
Development

No branches or pull requests

7 participants