Skip to content

Implementation of Unicode text segmentation according to Annex #29.

License

Notifications You must be signed in to change notification settings

YohDeadfall/Yoh.Text.Segmentation

Repository files navigation

Provides extension methods to split strings on Word boundaries, according to the Unicode Standard Annex #29 rules. Grapheme Cluster support should come soon.

Features

Word iteration

var input = "The quick (“brown”) fox can’t jump 32.3 feet, right?";
var result = new List<string>();
foreach (var word in input.EnumerateWords())
    result.Add(word.ToString());
// This code iterates over words in the specified string and produces:
// The|quick|brown|fox|can’t|jump|32.3|feet|right

Word boundary iteration

var input = "The quick (“brown”) fox can’t jump 32.3 feet, right?";
var result = new List<string>();
foreach (var word in input.EnumerateWordBoundaries())
    result.Add(word.ToString());
// This code iterates over words in the specified string and produces:
// The| |quick| |(|“|brown|”|)| |fox| |can’t| |jump| |32.3| |feet|,| |right|?

About

Implementation of Unicode text segmentation according to Annex #29.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages