Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

Closed
liuyang5832 opened this issue May 3, 2022 · 4 comments · Fixed by #4217 or #4235
Assignees
Labels
component: HLS The issue involves Apple's HLS manifest format component: WebVTT The issue involves WebVTT subtitles specifically priority: P1 Big impact or workaround impractical; resolve before feature release status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Milestone

Comments

@liuyang5832
Copy link

liuyang5832 commented May 3, 2022

Have you read the FAQ and checked for duplicate open issues?
yes

What link can we use to reproduce this?
https://shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What version of Shaka Player are you using?
v4.0.0-uncompiled

What browser and OS are you using?
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36

What did you do?
simply playback a generated hls manifest with v4.0.0 Shaka player and failed to see the caption display, it used to be good with v3.x.x version, and I tried with v3.3.2 version and it's still good.

link to v3.3.2 version that displays the same content well:
https://v3-3-2-dot-shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest_ts.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What did you expect to happen?
webvtt caption should be displayed for HLS upon selecting

What actually happened?
no caption display

@avelad avelad added type: bug Something isn't working correctly component: HLS The issue involves Apple's HLS manifest format priority: P1 Big impact or workaround impractical; resolve before feature release component: WebVTT The issue involves WebVTT subtitles specifically labels May 4, 2022
@avelad avelad added this to the v4.1 milestone May 4, 2022
@joeyparrish
Copy link
Member

This may be related to X-TIMESTAMP-MAP and the use of sequence mode for the audio/video content. In this WebVTT content, I see:

WEBVTT
X-TIMESTAMP-MAP=LOCAL:01:00:00.000,MPEGTS:324000000

02:08:06.923 --> 02:08:07.157
- Not at all?

VTT timestamps at 1 hour map to main content at 324000000 / 90k = 3600.00000 = 1 hour. So there is no relative offset.

However, the media timestamps are ignored in sequence mode. The first audio segment, for example, has an internal timestamp of 7686.952, or 2:08:06.952. This would align with the first subtitle, except that due to sequence mode, the first audio segment appears in the presentation timeline at ~0 instead.

Since we are not extracting timestamps from media, and X-TIMESTAMP-MAP relies on media timestamps, this system is broken.

@joeyparrish
Copy link
Member

If we could perfectly emulate sequence mode for text, then the first text segment would appear at time 0, without regard for the timestamps in it. However, we don't know when a text segment "starts" from its contents. The segment could cover a 10-second period of time, but only have a cue appear at time 5. Or it could be completely empty. So the distance from the conceptual start of a text segment and the start of the first cue cannot be known from the contents of the text. (Unlike with audio and video segments, where there are no periods of time without samples.) Trying to offset the text timestamps back to 0 to align with audio & video won't work without additional information.

We could go back to extracting timestamps from media for HLS, but avoid the latency hit we took for this in v3. Instead, we could wait until the first segment is fetched anyway. We could still use sequence mode, but extract the timestamp of the very first segment we fetch. The difference between that timestamp and the startTime of that segment's SegmentReference could be used to align text segments.

The biggest problems with this are the complexity of format parsing and timestamp extraction, and support for containerless or packed audio streams, which don't have internal timestamps at all. (Though we could argue that X-TIMESTAMP-MAP only works with video or audio in an MP4 or TS container, and say anyone with a weird WebVTT+audio-only HLS stream just needs to align their subtitles to 0.)

It would be nice if we could get away with forcing the platform to extract timestamps for us. I don't know if this would work, but if we could dynamically set sequence mode on SourceBuffers, then we could always do something like this for the very first segment, without complicated parsers and without high startup latency:

  1. If first segment:
    1. Set segment mode
    2. Append the first segment
    3. Check buffered to see what its timestamp was
    4. Clear the buffer
  2. Set sequence mode
  3. Set timestamp offset
  4. Append the segment

@joeyparrish
Copy link
Member

Looks like the trick to change modes works on desktop Chrome. Now to test it on all of our other supported platforms in the lab.

@joeyparrish
Copy link
Member

Works on all other platforms, except Tizen 2 & 3, which don't support sequence mode at all, and are already excluded from our new HLS parser.

There are some tests which need updating, but the fix seems good.

@joeyparrish joeyparrish self-assigned this May 11, 2022
joeyparrish added a commit to joeyparrish/shaka-player that referenced this issue May 11, 2022
Since the transition to sequence mode for HLS in v4.0.0, VTT cue
timings were broken.  This is mainly because VTT cue timing in HLS is
meant to be based on an offset from the media timestamps, and we
generally don't know those now that we use sequence mode.

To fix it, this change uses MediaSource segment mode for the very
first video segment as a way to extract the timestamp, then clears the
buffer, switches to sequence mode, and appends it again.  This lets us
get the timing data we need, while avoiding major drawbacks of the
previous HLS implementation:
 - We don't need to fetch segments upfront (which is high latency)
 - We don't need to fetch segments twice (once for timestamps, and
   once again to buffer)
 - We don't need to maintain parsers (which were complex and limited
   the formats we could support)

Closes shaka-project#4191
joeyparrish added a commit that referenced this issue May 11, 2022
Since the transition to sequence mode for HLS in v4.0.0, VTT cue
timings were broken.  This is mainly because VTT cue timing in HLS is
meant to be based on an offset from the media timestamps, and we
generally don't know those now that we use sequence mode.

To fix it, this change uses MediaSource segment mode for the very
first video segment as a way to extract the timestamp, then clears the
buffer, switches to sequence mode, and appends it again.  This lets us
get the timing data we need, while avoiding major drawbacks of the
previous HLS implementation:
 - We don't need to fetch segments upfront (which is high latency)
 - We don't need to fetch segments twice (once for timestamps, and
   once again to buffer)
 - We don't need to maintain parsers (which were complex and limited
   the formats we could support)

Closes #4191
This was referenced May 11, 2022
joeyparrish added a commit that referenced this issue May 17, 2022
Since the transition to sequence mode for HLS in v4.0.0, VTT cue
timings were broken.  This is mainly because VTT cue timing in HLS is
meant to be based on an offset from the media timestamps, and we
generally don't know those now that we use sequence mode.

To fix it, this change uses MediaSource segment mode for the very
first video segment as a way to extract the timestamp, then clears the
buffer, switches to sequence mode, and appends it again.  This lets us
get the timing data we need, while avoiding major drawbacks of the
previous HLS implementation:
 - We don't need to fetch segments upfront (which is high latency)
 - We don't need to fetch segments twice (once for timestamps, and
   once again to buffer)
 - We don't need to maintain parsers (which were complex and limited
   the formats we could support)

Closes #4191
@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Jul 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
component: HLS The issue involves Apple's HLS manifest format component: WebVTT The issue involves WebVTT subtitles specifically priority: P1 Big impact or workaround impractical; resolve before feature release status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
3 participants