Skip to content

jibay/stopwords

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stopwords

Stopwords for various languages in JSON format. Per Wikipedia:

Stop words are words which are filtered out prior to, or after, processing of natural language data [...] these are some of the most common, short function words, such as the, is, at, which, and on.

You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.

Languages

There are a total of 43 supported languages:

Language Stopword count Filename
Arabic 162 ar.json
Armenian 45 hy.json
Basque 98 eu.json
Bengali 116 bn.json
Breton 126 br.json
Bulgarian 259 bg.json
Catalan 218 ca.json
Chinese 542 zh.json
Croatian 179 hr.json
Czech 346 cs.json
Danish 101 da.json
Dutch 275 nl.json
English 570 en.json
Esperanto 173 eo.json
Ewe 35 ee.json
Finnish 772 fi.json
French 606 fr.json
Galician 160 gl.json
German 596 de.json
Greek 75 el.json
Hebrew 194 he.json
Hindi 225 hi.json
Hungarian 781 hu.json
Indonesian 355 id.json
Irish 109 ga.json
Italian 623 it.json
Japanese 109 ja.json
Korean 679 ko.json
Latin 49 la.json
Latvian 161 lv.json
Marathi 99 mr.json
Norwegian 172 no.json
Persian 332 fa.json
Polish 260 pl.json
Portuguese 408 pt.json
Romanian 282 ro.json
Russian 539 ru.json
Slovak 110 sk.json
Slovenian 446 sl.json
Spanish 577 es.json
Swedish 401 sv.json
Thai 115 th.json
Turkish 279 tr.json

Sources

License and Copyright

Copyright (c) 2014 Peter Graham, contributors. Released under the Apache-2.0 license

About

Stopwords for 43 languages in JSON format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published