Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split chars of a string #134

Open
rsalmei opened this issue Sep 26, 2020 · 0 comments
Open

Split chars of a string #134

rsalmei opened this issue Sep 26, 2020 · 0 comments

Comments

@rsalmei
Copy link

rsalmei commented Sep 26, 2020

Hey man, I'm the author of alive-progress. I'm struggling to correctly support emojis in there (rsalmei/alive-progress#19), and I think this project could help me.

Please, how could I split the chars of a string, including emojis of all kinds?

For example:

In [17]: [(x, hex(ord(x)), unicodedata.east_asian_width(x)) for x in 'a👩‍❤️‍💋‍👩a']
Out[17]:
[('a', '0x61', 'Na'),
 ('👩', '0x1f469', 'W'),
 ('\u200d', '0x200d', 'N'),
 ('❤', '0x2764', 'N'),
 ('️', '0xfe0f', 'A'),
 ('\u200d', '0x200d', 'N'),
 ('💋', '0x1f48b', 'W'),
 ('\u200d', '0x200d', 'N'),
 ('👩', '0x1f469', 'W'),
 ('a', '0x61', 'Na')]

How could I correctly detect the three chars in this string using your code?

Update: I've seen there's a regexp function, but that does not help yet:

In [4]: emoji.get_emoji_regexp().split('asd😀►✧☘️❤️ok')
Out[4]: ['asd', '😀', '►✧', '☘️', '', '❤️', 'ok']

It still doesn't split the 'asd' and '►✧' and 'ok' strings, and I cannot seem to diferentiate them from the other grapheme clusters. (github doesn't correctly show ☘️ and ❤️ inside code blocks, which are the glyph variants, not the text ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants