andjc

Andj andjc

28 followers · 14 following

Melbourne, Australia

Achievements

BetaSend feedback

Achievements

BetaSend feedback

Block or Report

Block or report andjc

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned

enabling-languages/python-i18n enabling-languages/python-i18n Public

Random notes on Python internationalisation

Jupyter Notebook 16
enabling-languages/library-i18n enabling-languages/library-i18n Public

Exploration of internationalisation issues for libraries.

Jupyter Notebook 1

Grapheme tokenisation in Python

When working with tokenisation and break iterators, it is sometimes necessary to work at the character, syllable, line, or sentence levels. Character level tokenisation is an interesting case. By character, I mean a user perceivable unit of text, which the Unicode standard would refer to as a grapheme. The usual way I see developers handling character level tokenisation of English is via list comprehension or typecasting a string to a list:

```py

>>> t1 = "transformation"

>>> [char for char in t1]

enabling-languages/dinka enabling-languages/dinka Public

Dinka language resources

JavaScript 1
enabling-languages/nuer enabling-languages/nuer Public

Nuer language resources

Rich Text Format 1
enabling-languages/australian_indigenous enabling-languages/australian_indigenous Public

Keyboard layouts and web support for Aboriginal and Torres Straight Island languages

3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Andj andjc

Achievements

Achievements

Block or report andjc

Pinned