Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take surrogate pairs into account for caretPositions #306

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kalegd
Copy link

@kalegd kalegd commented Feb 22, 2024

For Issue #304

Adds a check to see if a glyph is a surrogate pair, in which case we set the same caret position for each char that makes up the glyph

In the issue you mentioned we might need to update selectionUtils. From a cursory glance I felt like it might not be necessary. It seems like getCaretAtPoint loops through the chars in order, so it should return the first char in a surrogate pair no matter what

@kalegd
Copy link
Author

kalegd commented Feb 22, 2024

A quick note on testing. I wasn't able to get the text example working prior to the changes, so I tested with the in-place editing example from the README

Copy link
Collaborator

@lojjic lojjic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Just one point of discussion about ligatures made up of surrogate-pair codepoints.

@@ -677,6 +689,25 @@ export function createTypesetter(resolveFonts, bidi) {
}
}

function fillSurrogateCaretPositions(caretPositions, charStartIndex, charCount) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified to just a single caretPositions.copyWithin() call. (Edit: if this function stays; see other comment.)

fillSurrogateCaretPositions(caretPositions, prevCharIndex, charCount)
} else {
fillLigatureCaretPositions(caretPositions, prevCharIndex, charCount)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this may not be a strict if/else situation ... you could potentially have ligatures formed from codepoints that are surrogates. Maybe a better approach would be to modify fillLigatureCaretPositions to check for surrogates stepwise within its loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, took a deeper look into things and I didn't realize the tons of other characters that are considered surrogate pairs that aren't emojis and could very likely be part of ligature substitutions. Thanks for calling this out

@kalegd
Copy link
Author

kalegd commented Mar 22, 2024

Sorry for disappearing like that! Was off tech-free cycling for a few weeks. Will take a look at your feedback Monday

@kalegd
Copy link
Author

kalegd commented Mar 25, 2024

Just looked at this again with fresh eyes today and have a thought I'd like to discuss before I redo the changes for this PR

I'm wondering if the current implementation of caret subdividing ligature substitutions is non-standard?

Example: "न्य" is 1 grapheme, but 3 code points. Despite being three code points, every text editor I've used has not subdivided the grapheme for the caret. The caret is either at the beginning of the whole grapheme or the end of the whole grapheme. Maybe what I wanted to do for surrogate pairs should actually be done for all graphemes? In this case we just update fillLigatureCaretPositions to use caretPositions.copyWithin() as you suggested

@lojjic
Copy link
Collaborator

lojjic commented Mar 25, 2024

It's definitely nonstandard how I've done it. The intent is to handle stylistic ligatures, like in many fonts where certain sequences of latin characters are replaced by a single streamlined glyph -- in those cases I wanted to allow selection/caret placement in between characters. A common example is "fi, ffi, fl, ffl" in Roboto.

I believe the fonts provide caret placement data for this sort of situation, but that's not currently parsed, hence the current simplistic approach.

I definitely see how this doesn't play nicely with glyph substitutions for non-Latin languages. I'm not sure if there's a simple way to distinguish the two scenarios so we can make both work? Would have to do some research.

Interestingly, in this text field I'm writing this comment in (Firefox on Mac) I'm able to place a caret in the middle of "न्य", but not in between all three codepoints - see image:
image

@kalegd
Copy link
Author

kalegd commented Mar 26, 2024

Ah thanks for clarifying, I wasn't aware of stylistic ligatures specified by fonts. The more you know :)

My next suggestion then is to use Intl.Segmenter since stylistic ligatures should still be parsed as separate graphemes by that. We'd need to wait until that hits Firefox though (it's been on Chrome and Safari for a few years now and is supposed to reach Firefox this year). With that we could check for graphemes within fillLigatureCaretPositions, and then adjust the caret subdivision accordingly in the case of a Font's stylistic ligature including a compound grapheme specified by the Segmenter. If there is no compound grapheme, everything should still work the way it currently does like your example of "fi, ffi, fl, ffl"

Or we leverage the caret data that is provided in font files, but I'm guessing that's a much more labor intensive PR?

I feel like using Intl.Segmenter is an easy to implement fallback that doesn't prevent using the Font's caret data in the future

@lojjic
Copy link
Collaborator

lojjic commented Mar 27, 2024

Intl.Segmenter is new to me, and an interesting possibility for a short term workaround.

I think, though, that I'm leaning toward just killing the current auto-spacing of interior carets and going with the single position copy. That will leave stylistic ligatures "broken" in the short term of course, but TBH that's more likely to motivate me to implement the true solution using the font-provided ligature caret positions, for which I've opened #308.

@kalegd
Copy link
Author

kalegd commented Mar 29, 2024

Sounds good to me. Would you prefer I make that change to copy position in this PR, or do you prefer to do this yourself separately as part of that new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants