Skip to content

aliiae/stopwords-tt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Tatar stopwords

This repository contains a list of stopwords for the Tatar language.

Method

The list was constructed manually based on word distributions obtained from news texts. There are mostly functional words (conjunctions, postpositions, interjections), as well as pronouns and numerals, some high frequency verbs like "диде" ("said"), and a few parentheses.

Current count: 1006 wordforms (~300 unique lemmata).

Acknowledgments

Some rare functional words were included from Apertium. Additional surface wordforms were generated automatically also using Apertium.

About

A list of stopwords for the Tatar language

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published