Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoFrame should use display size, not coded size #4525

Open
2 tasks
kainino0x opened this issue Mar 19, 2024 · 18 comments
Open
2 tasks

VideoFrame should use display size, not coded size #4525

kainino0x opened this issue Mar 19, 2024 · 18 comments
Assignees
Labels
api WebGPU API needs investigation Requires an investigation into implementation constraints
Milestone

Comments

@kainino0x
Copy link
Contributor

kainino0x commented Mar 19, 2024

copyExternalImageToTexture() currently uses the coded size of VideoFrame inputs.

Source type Width Height
VideoFrame VideoFrame.codedWidth VideoFrame.codedHeight

Coded size (codedWidth/codedHeight):

Width of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

Display size (displayWidth/displayHeight):

Width of the VideoFrame when displayed after applying aspect ratio adjustments.


  • While we're here, the table that describes the width/height for each source type needs to be moved so it applies to both copyExternalImageToTexture() and importExternalTexture().
  • Display size is what we want (it's what you see if you put a video on screen and IIUC it should be the same as the "intrinsic" size of HTMLVideoElement for any given frame of a video. So we should use it instead.
@kainino0x kainino0x added the tacit resolution queue Editors have agreed and intend to land if no feedback is given label Mar 19, 2024
@kainino0x kainino0x added this to the Milestone 0 milestone Mar 19, 2024
@kdashg
Copy link
Contributor

kdashg commented Mar 27, 2024

That sounds correct from an expected behavior standpoint.
I'm somewhat nervous about this, especially for importExternalTexture. (and its textureLoad operation, which I'm honestly surprised that it has, given the complexity of mappings between video data and its expected display/output)
I think it is probably fine but I am nervous about this area.

@greggman
Copy link
Contributor

greggman commented Mar 27, 2024

I feel like I'd expect the coded size and if I want to display that coded size correctly then I'd apply the display size to CSS.

Otherwise it's lossy.

@kainino0x kainino0x removed the tacit resolution queue Editors have agreed and intend to land if no feedback is given label Apr 2, 2024
@kainino0x
Copy link
Contributor Author

@greggman Not sure I understand. The coded size for a 1920x1080 video is typically 1920x1088 (120x68 16x16 macroblocks). The last half block of the last row is garbage data that is not part of the video and for most purposes doesn't exist. From the web's perspective I think it was totally invisible until WebCodecs came along.


Tangentially, while looking that up, I found some explanation of the aspect ratio thing that Kelsey remembered but didn't know the details of, here: w3c/webcodecs#26
(WebCodecs' VideoFrame has coded size, visible rect, and display size, but I don't know if that spec defines those three terms.)

Then, the question Kelsey raised would be, should the copyExternalImageToTexture() size and external_texture textureDimensions()/textureLoad() size match the visible rect, or the display size. I say it should be whatever WebGL's texImage2D() does - likely the display size, but I really don't know.

@kainino0x
Copy link
Contributor Author

HTML is pretty clear that 2d canvas drawImage uses the "natural size" which must certainly be the display size. But I wouldn't be super confident that real WebGL implementations match that. @shaoboyan I wonder if you could find or create a video with non-square pixels so we can test this in WebGPU? I don't see one in the WebGL test resources, which means it needs one, too.

It seems to simply require the -aspect flag in ffmpeg to change the "Display Aspect Ratio":
https://superuser.com/a/738833
https://superuser.com/a/908257

@kainino0x kainino0x added the needs investigation Requires an investigation into implementation constraints label Apr 2, 2024
@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 2, 2024

@kainino0x
I did a quick try in current cts resources. By doing following commands:

/ Generate four-colors-vp9-bt601.webm, mimeType: 'video/webm; codecs=vp9'
ffmpeg.exe -loop 1 -i .\four-colors.png -c:v libvpx-vp9 -pix_fmt yuv420p -frames 50 -colorspace smpte170m -color_primaries smpte170m -color_trc smpte170m -color_range tv -aspect 2:1 four-colors-vp9-bt601-2-1.webm

Drop the video into browser page directly and the chrome://media-internal page shows that:

info | "FFmpegDemuxer: created video stream, config codec: vp9, profile: vp9 profile0, level: not available, alpha_mode: is_opaque, coded size: [320,240], visible rect: [0,0,320,240], natural size: [480,240], has extra data: false, encryption scheme: Unencrypted, rotation: 0°, flipped: 0, color space: {primaries:SMPTE170M, transfer:SMPTE170M, matrix:SMPTE170M, range:LIMITED}"

So, the info said coded size: [320,240], visible rect: [0,0,320,240], natural size: [480,240]. And the video plays on screen is streched.

Without -aspect 2:1
image

With -aspect 2:1
image

@greggman
Copy link
Contributor

greggman commented Apr 2, 2024

I definitely don't want garbage. I guess in my mind there are more sizes. I can imagine

  1. The number of pixels in the data (eg. 1920x1088)
  2. The number of valid pixels in the data (eg 1920x1080)
  3. The size to display the valid data (eg. 3000x2000 or maybe aspect ratio) - this could be anything? (does this exist? It seems to be ignored for images)

I'd expect to get 2, not 3?

@kainino0x
Copy link
Contributor Author

Nice @shaoboyan, that'll be a useful test case. I'm also curious to see how it interacts with rotation (I'm guessing it applies before rotation, but not sure).

Before we land a decision here, I'd like to know the results of drawImage, texImage2D, copyExternalImageToTexture, and importExternalTexture in each browser's current implementations. I don't know what to expect, but it would be nice if they're all consistent (with the possible exception of textureDimensions and textureLoad on external_texture).

3. The size to display the valid data (eg. 3000x2000 or maybe aspect ratio) - this could be anything?

I'm unsure whether it's possible for the visible rect size and natural size to be very different, or if they have to be equal in one dimension like they are in Shaobo's test.

@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 3, 2024

@greggman
Per my understanding, the aspect attribute is one of the media stream/container metadata. My understanding is that the media stream expect that the stream present on screen by applying all metadatas(e.g. matrix).

  1. The size to display the valid data (eg. 3000x2000 or maybe aspect ratio) - this could be anything?

Per my experiment, I think the 3 situation only happens when we define the aspect attribute. Or we'll get a frame in their visible dimensions. And I think that behaviour (applying aspect scale/downscale) is a expected behaviour.

@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 3, 2024

@kainino0x @greggman @kdashg
I did experiments to "render" video frames through multiple ways. Here are results on chrome canary(version 125.0.6396.0 (Official Build) canary (64-bit)).

I reused a exist uploading sample with some changes (e.g. remove video size assignment). And I added a canvas2d context rendering logic like this:

  const canvas = container.appendChild(document.createElement('canvas'));
  canvas.width = videos[0].videoWidth;
  canvas.height = videos[0].videoHeight;

  const context = canvas.getContext('2d');
  context.drawImage(videos[0], 0, 0);

Canvases are all the same size as video.videoWidth, video.videoHeight. For HTMLVideoElement, I don't assign any size.

I still use the cts resource. By executing the commands:

// Generate four-colors-h264-bt601-rotate-90.mp4, mimeType: 'video/mp4; codecs=avc1.4d400c'
ffmpeg.exe -loop 1 -i .\four-colors.png -c:v libx264 -pix_fmt yuv420p -frames 50 -colorspace smpte170m -color_primaries smpte170m -color_trc smpte170m -color_range tv -vf transpose=2 temp.mp4

I got a video frame with info (by chrome://media-internals):

"FFmpegDemuxer: created video stream, config codec: h264, profile: h264 high, level: not available, alpha_mode: is_opaque, coded size: [240,320], visible rect: [0,0,240,320], natural size: [240,320], has extra data: true, encryption scheme: Unencrypted, rotation: 0°, flipped: 0, color space: {primaries:SMPTE170M, transfer:SMPTE170M, matrix:SMPTE170M, range:LIMITED}"

According to video info, coded_size=[240, 320], visible rect = [0, 0, 240, 320], natural_size = [240, 320];

Getting the video element and query size get video.vidoeWidth = 240 ; video.videoHeight =320

Using multiple way to 'draw' this frame:

WebGPU: ImportExternalTexture:
Screenshot 2024-04-03 161115

WebGPU: CopyEI2T:
Screenshot 2024-04-03 161128

WebGL2: TexImage2D:
Screenshot 2024-04-03 161138

WebGL2: TexStorage2D:
Screenshot 2024-04-03 161148

WebGL: TexImage2D:
Screenshot 2024-04-03 161158

Canvas2D: DrawImage:
Screenshot 2024-04-03 161211

VideoElement: Playback:
Screenshot 2024-04-03 161221

@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 3, 2024

(Cont'd)
Execute ffmpeg commands to get 90 degree rotated video frame based on above video frame:

ffmpeg.exe -display_rotation 270 -i temp.mp4 -c copy four-colors-h264-bt601-rotate-90.mp4

I got a video frame with info (by chrome://media-internals):

"FFmpegDemuxer: created video stream, config codec: h264, profile: h264 high, level: not available, alpha_mode: is_opaque, coded size: [240,320], visible rect: [0,0,240,320], natural size: [240,320], has extra data: true, encryption scheme: Unencrypted, rotation: 90°, flipped: 0, color space: {primaries:SMPTE170M, transfer:SMPTE170M, matrix:SMPTE170M, range:LIMITED}"

According to video info coded size: [240,320], visible rect: [0,0,240,320], natural size: [240,320].

Getting the video element and query size get video.vidoeWidth = 320; video.videoHeight =240

Using multiple way to 'draw' this frame:
WebGPU: ImportExternalTexture:
webgpu-import

WebGPU: CopyEI2T:
webgpu-copyEI2T

WebGL2: TexImage2D:
webgl2-texImage2D

WebGL2: TexStorage2D:
webgl2-texStorage2D

WebGL: TexImage2D:
webgl-texImage2D

Canvas2D: DrawImage:

canvas2d-drawImage

VideoElement: Playback:
videoElement

@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 3, 2024

(Cont'd)
Execute ffmpeg commands to get 90 degree rotated combined with -aspect 2:1 video frame based on origin video frame:

ffmpeg.exe -display_rotation 270 -i temp.mp4 -aspect 2:1 -c copy four-colors-h264-bt601-rotate-90-2-1.mp4

NOTE: FFmpeg has a warning: Overriding aspect ratio with stream copy may produce invalid files

I got a video frame with info (by chrome://media-internals):

"FFmpegDemuxer: created video stream, config codec: h264, profile: h264 high, level: not available, alpha_mode: is_opaque, coded size: [240,320], visible rect: [0,0,240,320], natural size: [640,320], has extra data: true, encryption scheme: Unencrypted, rotation: 90°, flipped: 0, color space: {primaries:SMPTE170M, transfer:SMPTE170M, matrix:SMPTE170M, range:LIMITED}"

According to video info coded size: [240,320], visible rect: [0,0,240,320], natural size: [640,320].

Getting the video element and query size get video.vidoeWidth = 320; video.videoHeight =640

Using multiple way to 'draw' this frame:
WebGPU: ImportExternalTexture:
webgpu-import-2-1

WebGPU: CopyEI2T:
webgpu-copyEI2T-2-1

WebGL2: TexImage2D:
webgl2-texImage2D-2-1

WebGL2: TexStorage2D:
Failed , with console log WebGL: INVALID_OPERATION: texSubImage2D: source sub-rectangle specified via pixel unpack parameters is invalid

WebGL: TexImage2D:
webgl-texImage2D-2-1

Canvas2D: DrawImage:
Canvas2D-DrawImage-2-1

VideoElement: Playback:
VideoElement-2-1

@shaoboyan
Copy link
Contributor

shaoboyan commented Apr 3, 2024

(Cont'd)
Execute ffmpeg commands to get 90 degree rotated combined with -aspect 1:2 video frame based on origin video frame:

ffmpeg.exe -display_rotation 270 -i temp.mp4  -aspect 1:2 -c copy four-colors-h264-bt601-rotate-90-1-2.mp4

NOTE: FFmpeg has a warning: Overriding aspect ratio with stream copy may produce invalid files

I got a video frame with info (by chrome://media-internals):

"FFmpegDemuxer: created video stream, config codec: h264, profile: h264 high, level: not available, alpha_mode: is_opaque, coded size: [240,320], visible rect: [0,0,240,320], natural size: [240,480], has extra data: true, encryption scheme: Unencrypted, rotation: 90°, flipped: 0, color space: {primaries:SMPTE170M, transfer:SMPTE170M, matrix:SMPTE170M, range:LIMITED}"

According to video info coded size: [240,320], visible rect: [0,0,240,320], natural size: [240,480].

Getting the video element and query size get video.vidoeWidth = 480; video.videoHeight =240

Using multiple way to 'draw' this frame:
WebGPU: ImportExternalTexture:
WebGPU-import-1-2

WebGPU: CopyEI2T:
WebGPU-copyEI2T-1-2

WebGL2: TexImage2D:
WebGL2-TexImage2D-1-2

WebGL2: TexStorage2D:
Failed, with console log : WebGL: INVALID_OPERATION: texSubImage2D: source sub-rectangle specified via pixel unpack parameters is invalid

WebGL: TexImage2D:
WebGL-TexImage2D-1-2

Canvas2D: DrawImage:
Canvas2D-DrawImage-1-2

VideoElement: Playback:
VideoElement-1-2

@kainino0x
Copy link
Contributor Author

kainino0x commented Apr 3, 2024

This is fantastic, thank you! Could you put the test pages up somewhere so they can be easily run in other browsers or in local builds? (like github pages, or send it to me and I can put it on github pages)

This is the most interesting outcome:

WebGL2: TexStorage2D:
Failed, with console log : WebGL: INVALID_OPERATION: texSubImage2D: source sub-rectangle specified via pixel unpack parameters is invalid

This sounds like the texSubImage2D call is treating the video as having a resolution that doesn't match the texture you've allocated with texStorage2D (which is the videoWidth/videoHeight, the intrinsic size). But also it might be a Chromium bug with the unpack parameters? Because video_webgl.js doesn't seem to use unpack parameters at all.

@shaoboyan
Copy link
Contributor

Could you put the test pages up somewhere so they can be easily run in other browsers or in local builds? (like github pages, or send it to me and I can put it on github pages)

Sure, I'll send the sample after some code clean up

@shaoboyan
Copy link
Contributor

@kainino0x Pls ref to https://shaoboyan.github.io/webgpu-samples/?sample=webgl-texImage2D under the categary “Video uploading through various APIs”. The bottom of the page.

@kdashg
Copy link
Contributor

kdashg commented Apr 8, 2024

Different frames of the video can technically have different sizes iirc.
Don't base future work off of what webgl specifies, because I certainly don't have confidence in it any further than "it seems to work for enough people that it's useful".
I believe it was a mistake to try to specify the resulting texture size requirements based on anything short of the actual decoded frame.
This class of uncertainties is why I pushed for importExternalTexture, because we can sidestep almost all this uncertainty.

@shaoboyan
Copy link
Contributor

(Pls correct me if I'm wrong).
@kdashg @greggman @kainino0x
One thing I'd like to mention is :
For HTMLVideoElement, developers don't have ability to get the coded_size (However webcodec VideoFrame could). The only size related attribute for HTMLVideoElement is videoWidth and videoHeight. And these width and height is so called display size.

(Based on above fact)
if developers want to create a texture and copy video frame from HTMLVideoElement to it, they could only use videoWidth and videoHeight.

@Kangz
Copy link
Contributor

Kangz commented Apr 9, 2024

GPU Web WG 2024-03-27
* KG: coded size likely to be larger than display size. Likely to be rotated w.r.t. display size. Might have unusual scaling, etc.
* At WebGPU's level - with developer guarantees we're giving to users (video frames are simpler than reality) - think display size is correct. Was enough to make me nervous. Two things - if you're copying from this image to something, it's strange - it's more than a copy, more like a resolve operation. Gives you something different from the input as the output. And for external textures in WGSL, you can sample them with uber-sampler but also textureLoad them - saying you textureLoad but don't operate on the coded size makes me nervous. Not sure why we added textureLoad from external textures…
* GT: feels strange to not just get the data, but rather some interpolation of the data. And if you want to get back to the original data you can't because it's been manipulated. The data inside a video isn't necessarily color data. Like Kinect 3D data. If all info was there - here's what is needed to display it correctly - would be easier.
* KN: would be more advanced usage of WebCodecs' VideoFrame. VideoFrame can give you a lot of raw stuff about the frame. We can iterate on giving more metadata about the video frames. Thought it was more useful to get zero-copy with magic, then putting it off until we could do it without magic.
* KG: just as one example, coded size for 1080p video is often 1088 pixels tall. (rounded to 16x16 blocks)
* GT: is that stretched, or cropped?
* KG / KN: cropped.
* KG: there's also aspect ratio - e.g. something like a square video but it's stretched.
* GT: bug in WebGL - a 320x240 video in Chrome showing 256x256 in Chrome. Metadata we were treating differently.
* KG: it's challenging. Still, think displaySize is the right thing to use.

@kainino0x kainino0x added the api WebGPU API label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api WebGPU API needs investigation Requires an investigation into implementation constraints
Projects
None yet
Development

No branches or pull requests

5 participants