CLARIAH
Pinned
Repositories
- vocab-workers Public
- wp3-ucto Public Forked from LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
-
- wp3-foliautils Public Forked from LanguageMachines/foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot
-
-
- wp3-folia Public Forked from proycon/folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are support, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processi…