Skip to content

🪼 a python library for doing approximate and phonetic matching of strings.

License

Notifications You must be signed in to change notification settings

jamesturk/jellyfish

Folders and files

NameName
Last commit message
Last commit date
Dec 14, 2024
Aug 14, 2023
Dec 3, 2024
Jul 28, 2024
Jul 28, 2024
Sep 7, 2024
Nov 17, 2023
Aug 11, 2014
Mar 25, 2023
Dec 8, 2020
Aug 16, 2023
Dec 14, 2024
Oct 14, 2023
Mar 27, 2023
Jul 28, 2024
Jun 21, 2023
Dec 31, 2024
Dec 9, 2021

Repository files navigation

Overview

jellyfish is a library for approximate & phonetic matching of strings.

Source: https://github.com/jamesturk/jellyfish

Documentation: https://jamesturk.github.io/jellyfish/

Issues: https://github.com/jamesturk/jellyfish/issues

PyPI badge Test badge Coveralls Test Rust

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaccard Index
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
>>> jellyfish.jaro_similarity('jellyfish', 'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
1

>>> jellyfish.metaphone('Jellyfish')
'JLFX'
>>> jellyfish.soundex('Jellyfish')
'J412'
>>> jellyfish.nysiis('Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex('Jellyfish')
'JLLFSH'