Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to think about the compositor in 2022 #77

Open
raphlinus opened this issue Jul 22, 2022 · 4 comments
Open

How to think about the compositor in 2022 #77

raphlinus opened this issue Jul 22, 2022 · 4 comments

Comments

@raphlinus
Copy link
Owner

A followup to the compositor is evil and Advice for the next dozen Rust GUIs.

Goals: modern appearance (transparency etc), cross-platform, ability to embed video/3d/other content, power-efficiency. Other assumptions: our process is fast (Rust), more interested in current than older platforms (drop X11 and Windows 7).

Historical bit: subwindows. On Windows this was HWND. Every control had its own ("windowless" changed this). In the early days, it was effectively in-process, window objects were 88 bytes (see Windows are not cheap objects for source). Downsides: primitive imaging model (opaque rectangle), lack of synchronization. Subwindows still exist but can be considered legacy.

Subtle point: HWNDs (and related concepts) were exposed, lots of knowledge. Compositor concepts are mostly hidden by abstraction layers, so knowledge is arcane.

Compositor concepts. Windows has DirectComposition and Windows.UI.Composition - same wine in different bottles, animation engine is different. mac has Core Animation. These are effectively identical. Tree of {windows: visuals, mac: layers}. Parameters include transforms, clipping, opacity. Also animation engine - these parameters can be functions of time. Some visuals/layers can be simple (solid colors), most come from some other source. One very important source is swapchains from GPU. Other important source is video.

  • Video: colorspace conversion and scaling can be done by GPU. Important for performance & power.

Wayland is a little different, lower level. Tree of surfaces but visual effects much more limited - essentially just transform (limited forms of scaling). Even clip to rectangle is not supported in base Wayland protocol, but is present in viewporter extension.

Avoiding compositor latency

In usual case, compositor adds one frame of latency. As of Windows 11, Composition swapchain has a mode to bypass that; direct scanout by graphics hardware from app-provided swapchain. Probably most useful for games (intended to replace full-screen modes).

Synchronization

All compositors have "commit" concept: updates to tree are deferred until commit, then all are applied atomically.

Cross-process synchronization is trickier. Wayland has "client-side decorations" largely so app process doesn't need to synchronize with client process. (TODO: how is synchronization done with compositor-side decorations?). Windows doesn't solve the problem (Windows smooth resize still broken). mac will include window resize in transaction, but tricky to get timing right - need to do waitUntilScheduled to make sure the present is inside the transaction(Glitchless Metal Window Resizing).

Wayland has two modes: sync and desync. Latter makes sense for eg embedded video content. Only works in scrolling context when compositor can do clipping (viewporter extension, not in core).

For simple video playback, desync makes sense, but to handle proper scroll embedding with clipping (w/o viewporter extension), several things need to be true: set_sync; the embedded content needs to request animation frames from the host at a rate appropriate for the content (ie the video frame rate); the host needs to provide clipping information for each frame (which the guest needs to respect); and the host needs to commit the toplevel surface at the end of the animation frame.

Very similar to what needs to happen if the host is doing compositing. Instead of binding the guest content to a Wayland subsurface, the guest provides it as a GPU buffer (ie vkImage). The host then composites it (which opens up a wide range of effects such as masks, blends, and blurs). Providing clip info to the guest is optional; the host can just ignore pixels outside the clip region when compositing.

Damage regions

mac still doesn't do it (TODO: still true?). Can fake using tiles and/or sticking decals over existing layers (latter only works when opaque).

Vulkan has incremental present. Doesn't seem to be implemented on Windows, is common on Android & Linux. On Windows, DXGI does support it (Present1).

TODO question: can Windows Vulkan feed a compositor, or is this subwindow only?

Using compositor for scrolling

Generally need to render in tiles, apply clip transform in compositor, have some offscreen content. Scroll can be done by updating transforms, don't need to re-render.

  • Responsive scrolling on mac, similar thing on Windows

Maybe rant about layering violation of trying to cut-through input to scrolling. Visible seams if that outruns the slow app. Trying to work around slow async is generally a source of complexity and degraded quality.

Sketch of implementation for UI

UI generates visual tree. For each node, make a decision whether it will be rendered in process or delegated to compositor. Some nodes will be video sources etc. In that case, either source directly updates a compositor object, or can provide content in a GPU texture. In latter case, need control: embedded content requests anim frames, is notified when paint cycle kicks off, there is synchronization (semaphore in Vulkan) so content can be read by consumer.

Thought: compositor is basically another 2D renderer, similar to one inside app, but with different tradeoffs. Many imaging operations the same (including path rendering on mac/windows), but compositors can't do text.

How to decide? Almost purely engineering considerations. When content is static, best to use compositor. For video content, use compositor unless constrained otherwise. On Wayland, weak imaging model may force host rendering (soft masks, blend modes, opacity, blur effect). Granularity should not be too fine - too many compositor objects will cause overhead (cite Register article?). Popup menus etc should probably use compositor (so they can exceed frame boundaries, cast shadows on window decorations etc). Scrolling may use compositor, otherwise simple UI should probably all be done in host. If trying to leverage low-latency paths, one big window (game-like) will probably work better, biases toward doing composition in host.

Also part of tradeoff: GPU RAM usage by the intermediate layers.

Maybe talk about Flutter heuristic here? After 3 paint cycles with no dynamic update, render to texture and composite.

Probably talk about how compositor interface is IPC under the hood; objects crossing the wire are much more expensive than in-process. In Wayland case, this is documented and explicit. In Windows and mac, performance is black-box. Likely some of the work is in-process but hard to know for sure without doing careful performance evaluation. Building something real will require performance work starting with measurement.

Animation

Recommend not using compositor animation. Compute animation every frame. Facilitates updating the content based on input. Requires rendering thread to be fast (16ms; context switch is 10s of µs).

Downside of animation for scrolling:

Thought: if every frame is independent, then much less chance of glitching when dynamically reconfiguring the visual tree. Potentially could make host/compositor decision dynamically on each frame, even for same object.

Frame pacing

Each frame should target a specific timestamp. Should be able to downgrade from max display fps (increase smoothness, reduce power). Choose a time to kick off paint cycle - goal is to have very low probability of missing present deadline.

Exact timestamp of presentation important for eg synchronizing audio. All sources should be able to sync. Platform APIs have ways of querying present statistics.

@SethDusek
Copy link

SethDusek commented Oct 18, 2022

Platform APIs have ways of querying present statistics.

I believe Vulkan has VK_GOOGLE_DISPLAY_TIMING for this. AFAICT this is implemented on Windows and Android, but it never landed in Mesa (there was an experimental branch for it). Wayland has the presentation_timing extension

@xorgy
Copy link

xorgy commented Oct 29, 2022

As of Windows 11, Composition swapchain has a mode to bypass that...

Also w/ Wayland and X11 compositing managers there is fullscreen unredirect for a similar purpose.

@xorgy
Copy link

xorgy commented Oct 29, 2022

Maybe talk about Flutter heuristic here? After 3 paint cycles with no dynamic update, render to texture and composite.

This ties in with the issue of compositor scrolling: If the source of the compositor object is scrolling internally, then it looks "dynamic" to the top-level compositor, as much as a video at the display refresh rate or another continuously animated region.

Unlike most dynamic updates, scrolling is essentially trivial (non-overlapping blits), so making a texture for the scroll region seems a waste.

@raphlinus
Copy link
Owner Author

Worth linking https://github.com/flutter/flutter/issues/59327 which is CPU usage to blink the cursor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants