Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widgets don't compute correct width for Emoji sequences #64

Closed
bkahlert opened this issue Jul 20, 2022 · 4 comments Β· Fixed by #68
Closed

Widgets don't compute correct width for Emoji sequences #64

bkahlert opened this issue Jul 20, 2022 · 4 comments Β· Fixed by #68

Comments

@bkahlert
Copy link

If you render a Widget, e.g. a grid, emoji sequences like

  • regional letters to make up flags, e.g. πŸ‡©πŸ‡ͺ
  • emojis with skin tone modifier, e.g. πŸ‘¨πŸΎβ€πŸ¦±
  • ZWJ (ZERO WIDTH JOINER) joined emojis , e.g. πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦
    seem to render with an incorrectly rendered width.

Sample:

 Terminal().render(
                    grid {
                        cellBorders = NONE
                        it.forEachIndexed { i, (_, maxWidth) ->
                            column(i) {
                                width = ColumnWidth.Fixed(maxWidth + this@BlockRenderer.style.layout.gap)
                                if (i > 0) padding(0, this@BlockRenderer.style.layout.gap)
                            }
                        }
                        row {
                            whitespace = PRE_LINE
                            verticalAlign = TOP
                            overflowWrap = BREAK_WORD
                            cellsFrom(it.map { it.first.toString() + "πŸ‡©πŸ‡ͺπŸ‘¨πŸΎβ€πŸ¦±πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦" })
                        }
                    }
                )

Above code renders as:

LOREM IPSUM DOLOR SIT AMET, CONSETETUR       LOREM IPSUM DOLOR SIT AMET,
SADIPSCING ELITR, SED DIAM NONUMY            CONSETETUR SADIPSCING ELITR,
EIRMOD.πŸ‡©πŸ‡ͺπŸ‘¨πŸΎβ€πŸ¦±πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦                      SED DIAM NONUMY
                                             EIRMOD.πŸ‡©πŸ‡ͺπŸ‘¨πŸΎβ€πŸ¦±πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦

whereas the words "SED DIAM NONUMY" of the second column appear too much to the left.

@ajalt
Copy link
Owner

ajalt commented Jul 21, 2022

Oof, that's a challenging problem. Mordant already goes farther than a lot of terminal libraries and has its own wcwidth implementation that I parse from the unicode standard. But it doesn't handle all of the elaborate emoji constructs. I'd absolutely accept a PR to handle more of these cases.

@bkahlert
Copy link
Author

If you assume that emoji sequences can always be displayed as a single emoji you only need to detect those clusters.
They are called grapheme clusters and I already integrated an appropriate break iterator here: https://github.com/bkahlert/kommons-debug/blob/master/src/commonMain/kotlin/com/bkahlert/kommons/Grapheme.kt#L25. It uses com.ibm.icu:icu4j:71.1 to split strings on the JVM and @stdlib/string-next-grapheme-cluster-break:0.0.8 for JavaScript.

@ajalt
Copy link
Owner

ajalt commented Jul 30, 2022

Unicode has a lot of rules for grapheme sequences beyond just emoji, many of which (e.g. combining character sequences) mordant already handles correctly.

I added tables for the Emoji Sequences and Emoji ZWJ sequences, but most typefaces don't implement all of the sequences, and even the ones that do all use non-cell-aligned glyphs for some of the emoji (like the larger family emojis). So there's no way to produce a perfect result, but this is probably the best we can do.

Capture

@ajalt ajalt closed this as completed in #68 Jul 30, 2022
@bkahlert
Copy link
Author

Sorry for not having responded. Don't have much time currently. But big thanks to you for implementing this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants