Widgets don't compute correct width for Emoji sequences #64

bkahlert · 2022-07-20T19:50:04Z

If you render a Widget, e.g. a grid, emoji sequences like

regional letters to make up flags, e.g. 🇩🇪
emojis with skin tone modifier, e.g. 👨🏾‍🦱
ZWJ (ZERO WIDTH JOINER) joined emojis , e.g. 👩‍👩‍👦‍👦
seem to render with an incorrectly rendered width.

Sample:

 Terminal().render(
                    grid {
                        cellBorders = NONE
                        it.forEachIndexed { i, (_, maxWidth) ->
                            column(i) {
                                width = ColumnWidth.Fixed(maxWidth + this@BlockRenderer.style.layout.gap)
                                if (i > 0) padding(0, this@BlockRenderer.style.layout.gap)
                            }
                        }
                        row {
                            whitespace = PRE_LINE
                            verticalAlign = TOP
                            overflowWrap = BREAK_WORD
                            cellsFrom(it.map { it.first.toString() + "🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦" })
                        }
                    }
                )

Above code renders as:

LOREM IPSUM DOLOR SIT AMET, CONSETETUR       LOREM IPSUM DOLOR SIT AMET,
SADIPSCING ELITR, SED DIAM NONUMY            CONSETETUR SADIPSCING ELITR,
EIRMOD.🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦                      SED DIAM NONUMY
                                             EIRMOD.🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦

whereas the words "SED DIAM NONUMY" of the second column appear too much to the left.

The text was updated successfully, but these errors were encountered:

ajalt · 2022-07-21T02:44:27Z

Oof, that's a challenging problem. Mordant already goes farther than a lot of terminal libraries and has its own wcwidth implementation that I parse from the unicode standard. But it doesn't handle all of the elaborate emoji constructs. I'd absolutely accept a PR to handle more of these cases.

bkahlert · 2022-07-21T07:57:26Z

If you assume that emoji sequences can always be displayed as a single emoji you only need to detect those clusters.
They are called grapheme clusters and I already integrated an appropriate break iterator here: https://github.com/bkahlert/kommons-debug/blob/master/src/commonMain/kotlin/com/bkahlert/kommons/Grapheme.kt#L25. It uses com.ibm.icu:icu4j:71.1 to split strings on the JVM and @stdlib/string-next-grapheme-cluster-break:0.0.8 for JavaScript.

ajalt · 2022-07-30T17:19:07Z

Unicode has a lot of rules for grapheme sequences beyond just emoji, many of which (e.g. combining character sequences) mordant already handles correctly.

I added tables for the Emoji Sequences and Emoji ZWJ sequences, but most typefaces don't implement all of the sequences, and even the ones that do all use non-cell-aligned glyphs for some of the emoji (like the larger family emojis). So there's no way to produce a perfect result, but this is probably the best we can do.

bkahlert · 2022-09-27T08:53:58Z

Sorry for not having responded. Don't have much time currently. But big thanks to you for implementing this feature!

ajalt mentioned this issue Jul 30, 2022

Add cell width support for emoji sequences #68

Merged

ajalt closed this as completed in #68 Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Widgets don't compute correct width for Emoji sequences #64

Widgets don't compute correct width for Emoji sequences #64

bkahlert commented Jul 20, 2022

ajalt commented Jul 21, 2022

bkahlert commented Jul 21, 2022

ajalt commented Jul 30, 2022

bkahlert commented Sep 27, 2022

Widgets don't compute correct width for Emoji sequences #64

Widgets don't compute correct width for Emoji sequences #64

Comments

bkahlert commented Jul 20, 2022

ajalt commented Jul 21, 2022

bkahlert commented Jul 21, 2022

ajalt commented Jul 30, 2022

bkahlert commented Sep 27, 2022