Skip to content

nota/split-graphemes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

split-graphemes

Divide ligature letters such as Thai, Khmer letters and complex emoji into array of graphemes. You can simply use this library instead of Array.from to get graphemes.

Tests

Installation

$ npm install split-graphemes

Examples

Emoji

// An emoji '👨‍👩‍👦‍👦' consists of 4 people face emoji joined by Zero Width Joiners (ZWJ).
const chars = Array.from('👨‍👩‍👦‍👦') // ['👨', ZWJ, '👩', ZWJ, '👦', ZWJ, '👦']
// It is interpreted exactly as one character!
const chars = splitGraphemes('👨‍👩‍👦‍👦') // ['👨‍👩‍👦‍👦']

Khmer characters

Array.from('ប៉ុស្ដិ៍') // ['ប', '៉', 'ុ', 'ស', '្', 'ដ', 'ិ', '៍']
splitGraphemes('ប៉ុស្ដិ៍') // ['ប៉ុ', 'ស្ដិ៍']

Japanese NFD

splitGraphemes('ごん゙に゙ぢば') // ['ご', 'ん゙', 'に゙', 'ぢ', 'ば']
splitGraphemes('パピプペポ') // ['パ', 'ピ', 'プ', 'ペ', 'ポ']

English

splitGraphemes('Hello') // ['H', 'e', 'l', 'l', 'o']

Supported ligature characters

The list of characters is at here.