GitHub - snowballstem/snowball: Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algorithms implemented using it.

Snowball was originally designed and built by Martin Porter. Martin retired from development in 2014 and Snowball is now maintained as a community project. Martin originally chose the name Snowball as a tribute to SNOBOL, the excellent string handling language from the 1960s. It now also serves as a metaphor for how the project grows by gathering contributions over time.

The Snowball compiler translates a Snowball program into source code in another language - currently Ada, ISO C, C#, Go, Java, Javascript, Object Pascal, Python and Rust are supported.

This repository contains the source code for the snowball compiler and the stemming algorithms. The snowball compiler is written in ISO C - you'll need a C compiler which support C99 to build it (but the C code it generates should work with any ISO C compiler).

See https://snowballstem.org/ for more information about Snowball.

What is Stemming?

Stemming maps different forms of the same word to a common "stem" - for example, the English stemmer maps connection, connections, connective, connected, and connecting to connect. So a search for connected would also find documents which only have the other forms.

This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer.

Name	Name	Last commit message	Last commit date
Latest commit ojwb english.sbl: Add 'evening' to exception2 Mar 17, 2025 029b41c · Mar 17, 2025 History 1,154 Commits
.github/workflows	.github/workflows	java: Fix javac warnings in SnowballProgram.java	Mar 13, 2025
ada	ada	[ada] Fix bugs with handling of length vs limit	Mar 17, 2025
algorithms	algorithms	english.sbl: Add 'evening' to exception2	Mar 17, 2025
charsets	charsets	WIP	Mar 17, 2018
compiler	compiler	[ada] Fix bugs with handling of length vs limit	Mar 17, 2025
csharp	csharp	Clean up whitespace	Mar 16, 2025
doc	doc	Clean up whitespace	Mar 16, 2025
examples	examples	Improve stemword error message	Jan 7, 2025
go	go	go: Make negative hop work as documented	Nov 23, 2020
include	include	Document input should be lowercase with composed accents	Nov 7, 2023
java/org/tartarus/snowball	java/org/tartarus/snowball	Clean up whitespace	Mar 16, 2025
javascript	javascript	Clean up whitespace	Mar 16, 2025
libstemmer	libstemmer	Clean up whitespace	Mar 16, 2025
pascal	pascal	Clean up whitespace	Mar 16, 2025
python	python	stemwords.py: Try to open input before output	Mar 14, 2025
runtime	runtime	Fix potential NULL dereference	Feb 14, 2025
rust	rust	Clean up whitespace	Mar 16, 2025
tests	tests	Improve comments in stemtest.c	Sep 25, 2023
.gitignore	.gitignore	.gitignore: Update	Oct 6, 2021
AUTHORS	AUTHORS	Remove the cmake build system	Sep 3, 2015
CONTRIBUTING.rst	CONTRIBUTING.rst	CONTRIBUTING.rst: Go into more detail	Feb 27, 2024
COPYING	COPYING	Update COPYING to include my own contributions	Oct 3, 2019
GNUmakefile	GNUmakefile	GNUmakefile: Use ‘build’ module instead of calling setup.py directly	Mar 4, 2025
NEWS	NEWS	Draft NEWS entry for 2.3.0	Sep 2, 2024
README.rst	README.rst	Fix typo in README files	Mar 28, 2024
iconv.py	iconv.py	Use iconv by default	Sep 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Stemming?

About

Releases

Packages

Contributors 32

Languages

License

snowballstem/snowball

Folders and files

Latest commit

History

Repository files navigation

What is Stemming?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 32

Languages

Packages