Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebVTT multi-line subtitles overlapping #10980

Open
1 task
sCoredump opened this issue Feb 12, 2023 · 6 comments
Open
1 task

WebVTT multi-line subtitles overlapping #10980

sCoredump opened this issue Feb 12, 2023 · 6 comments
Assignees
Labels

Comments

@sCoredump
Copy link

sCoredump commented Feb 12, 2023

ExoPlayer Version

2.18.2

Devices that reproduce the issue

  • Pixel 4a running Android 13
  • Fire HD 10 (9th generation) running Fire OS 7.3.2.7
  • Emulator (Tablet, API 30)

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Yes

Reproduction steps

Play any video using a WebVTT subtitle track that contains two (or more) cues with overlapping time codes, the first of them also spanning multiple lines of text.

The multiple lines can result from explicit newline characters or text that is too long to fit in a single line (or a combination of both).

Can also be reproduced by using my sample WebVTT file in the demo app:

[
  {
    "name": "Clear DASH",
    "samples": [
      {
        "name": "HD (MP4, H264)",
        "uri": "https://storage.googleapis.com/wvmedia/clear/h264/tears/tears.mpd",
        "subtitle_uri": "https://filebin.net/fuy39ich7mb93kwe/sample-multi-line-overlap.vtt",
        "subtitle_mime_type": "text/vtt"
      },
...

Expected result

The subtitle cues should not visually overlap.

Actual result

The subtitle cues overlap visually, with each simultaneously displayed cue only moving up one line, instead of however many line the previous cues already take up.

image

Media

I do not have an easily shareable sample video, so I created a sample WebVTT file that should demonstrate the issue.
sample-multi-line-overlap.zip

Bug Report

@tonihei
Copy link
Collaborator

tonihei commented Feb 15, 2023

@icbaker Can you take a look at this one to see if this is a known issue or feature gap?

@icbaker
Copy link
Collaborator

icbaker commented Feb 21, 2023

I can reproduce the problem.

When handling concurrent subtitles, we currently assume each one is only one line high (the -i below):

// Steps 4 - 10 of https://www.w3.org/TR/webvtt1/#cue-computed-line
// (steps 1 - 3 are handled by WebvttCueParser#computeLine(float, int))
Collections.sort(cuesWithUnsetLine, (c1, c2) -> Long.compare(c1.startTimeUs, c2.startTimeUs));
for (int i = 0; i < cuesWithUnsetLine.size(); i++) {
Cue cue = cuesWithUnsetLine.get(i).cue;
currentCues.add(cue.buildUpon().setLine((float) (-1 - i), Cue.LINE_TYPE_NUMBER).build());
}
return currentCues;

Unfortunately that logic is executing when we're parsing the cues, rather than showing them on screen - so we have no idea how big the viewport is, and therefore can't make allowances for long cues wrapping onto additional lines.

Now that I read this logic again, and the spec it's based off - I actually don't think we've quite correctly implemented the spec (but not in a way that would fix this problem, unfortunately). The spec says:

Let n be the number of text tracks whose text track mode is showing and that are in the media element’s list of text tracks before track.

But our code actually sets 'n' to be the number of cues that are currently showing in the current (only) text track. To strictly follow the spec, we should remove the incrementing completely (since we only have one text track). That will actually make the problem described here worse, because all simultaneous cues will now overlap at the bottom of the screen, not just multiline ones.

The WebVTT spec describes how overlapping cues should be handled when rendering in 7.2 step 10.14 (and onwards). Unfortunatley ExoPlayer's rendering logic is completely general, and not specific to different subtitle implementations.

We could change our rendering logic to detect and avoid overlapping cue boxes. This would handle both 'explicitly multiline' cues (i.e. with newline characters) and 'implicitly multiline' cues (i.e. because of wrapping long cues with no explicit newline characters). We would need to put this logic into CanvasSubtitleOutput (and ideally WebViewSubtitleOutput too). This would apply to all subtitle formats, not just WebVTT. Depending on how we do it, it might also introduce cases where a cue box 'jumps' to a new position on the screen when a new cue is shown. We may also find that somebody is relying on cue boxes being allowed to overlap (though this does seem unlikely, particularly if they're using something like WebVTT or TTML).

@nigelmegitt
Copy link

We may also find that somebody is relying on cue boxes being allowed to overlap (though this does seem unlikely, particularly if they're using something like WebVTT or TTML).

In general, TTML does permit overlapping regions, where different text can be in different regions; it even defines z-order for this case. However the IMSC profiles and the EBU-TT-D profile explicitly prohibit such overlaps so they're unlikely to be commonplace in subtitle and caption TTML documents.

@icbaker
Copy link
Collaborator

icbaker commented Feb 22, 2023

Changing ExoPlayer's rendering logic is quite involved and we don't have plans to look at this soon. If we 'fix' the WebVTT logic to precisely match the spec without changing the rendering logic then the problem described here will occur in many more cases - so I don't think we should do that.

@szaboa
Copy link
Contributor

szaboa commented Apr 4, 2024

@icbaker I was referring to this issue in my comment on #1177. Do you have any idea or could you give some directions how could we handle the overlapping vtt cues in our forked exo repo? Referring to this

We could change our rendering logic to detect and avoid overlapping cue boxes.

@icbaker
Copy link
Collaborator

icbaker commented Apr 10, 2024

The conclusion(s) I came to when I last looked into this:

  1. The WebVTT spec provides some clear defaults for lots of properties if they're not set.
  2. ExoPlayer currently resolves to these defaults quite 'early' as part of parsing the WebVTT text into Cue objects. This logic has no access to the screen output size, and limited/no access to cues that are displayed simultaneously.
  3. The WebVTT layout rules distinguish between 'property explicitly set to value X' and 'property set to X by default because it was absent'
  4. Therefore, the resolution in step (2), which throws away the 'unset' knowledge, is too early if we want to apply the WebVTT layout logic later once we know how large the viewport is, and what other cues are shown at the same time.

One way I thought to solve this was to either plumb more of the raw WebVTT source data somewhere inside the Cue object. Or maybe make the Cue object richer to distinguish between the two cases described above:

  • 'property explicitly set to value X'
  • 'property set to X by default because it was absent'

Separate concern, touched on above: ExoPlayer's subtitle layout logic which does know the screen size etc is currently completely format-agnostic, it just deals in Cue objects. It's unclear how to nicely incorporate WebVTT-specific rules into this logic.

Maybe (total guess), it would be "good enough" to make the 'Cue object richer' change described above, and implement the WebVTT layout rules for all text formats, on the understanding/assumption that it provides a reasonable collision-avoiding layout algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants