Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing text emoji or not #278

Open
cvzi opened this issue Dec 7, 2023 · 2 comments
Open

Replacing text emoji or not #278

cvzi opened this issue Dec 7, 2023 · 2 comments

Comments

@cvzi
Copy link
Contributor

cvzi commented Dec 7, 2023

Not really a bug, more an observation:

When using emoji.replace_emoji(str, '') to strip emoji from a string, it also replaces all text emoji. This might not be the expected behavior by the user.

For example:

> emoji.replace_emoji("pure emoji 😁 text variant © emoji variant ©️", "?")
pure emoji ? text variant ? emoji variant ?

So the © get removed, even though it is in text-variant. If someone is trying to remove emoji from a string, then they might not want to remove these symbols like © ® ↔

However several of these text-emoji are represented as text in one font and as emoji in another font.
For example as I am writing this issue, the :right_arrow: \u27a1 ➡ is represented as a text emoji in Github's text editor, but it will be displayed as a emoji when the issue appears online.

(it still can be forced to text with the text-variant selector: \u27a1\ufe0e ➡︎ )

I don't see a solution to this, but the behaviour should be mentioned in the documentation.

One option for some users could be to replace emoji, but keep emoji with text-variant and force the text-variant by appending \uFE0E (text variant selector):

import emoji

def repl(e, d):
  if 'variant' in d and not e.endswith('\uFE0F'):
    # Emoji supports variants and emoji-variant (\uFE0F) is not selected
    if e.endswith('\uFE0E'):
      # Emoji is already in text-variant
      return e
    else:
      # Emoji is not in text-variant, add text-variant selector
      return e + '\uFE0E'
  else:
    # Emoji doesn't support variants, or emoji-variant is selected
    return ''


emoji.replace_emoji("smile 😁. copyright ©. Arrow-no-variant ➡.", repl)

Input is: "smile 😁. copyright ©. Arrow-no-variant ➡."
Output is: "smile . copyright ©︎. Arrow-no-variant ➡︎."

@lovetox
Copy link
Contributor

lovetox commented Apr 21, 2024

I don't think you should open the door and base any decision on things a font might do or not. I think that's a losing battle.

As i understand it the function is a helper tool for string manipulation. Presentation is on a different layer and should not be the business of this function.

As such i think its important that the method has a well defined behavior.

Now replacing a copyright sign which was added in 1993, where i guess nobody knew the word emoji, without having the option to turn it off, i would not think is expected from an emoji function.

As i understand the problem, there were symbols (not emojis) and later when they invented emojis instead of giving the copyright sign a new emoji codepoint, they invented variant selectors.

My first idea would be

Add a boolean argument like replace_text_variants

Codepoint + Emoji Selector -> Replace always (not dependent on replace_text_variants)
Codepoint + Text Selector -> Only Replace if replace_text_variants=True
Codepoint -> Only Replace if replace_text_variants=True

@lovetox
Copy link
Contributor

lovetox commented May 17, 2024

i looked at the standard again, and it clearly marks every codepoint that is an Emoji, and the copyright sign 00A9 is according to the unicode data set marked as Emoji.

So even if a text variant selector is added, its still an emoji, the presentation is just different by fonts (using another glyph/image).

But we can assume that most users will not be experts of the unicode standard regarding emojis, so they probably expect text variants to be not replaced.

I still would go for a boolean argument, that leaves the choice to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants