Skip to content

gtri/bk-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BK-tree

A Java BK-tree library.

BK-trees offer a simple index of elements in a metric space that allows for searching the tree for elements within a certain distance of the search query with sub-linear efficiency.

For example, a BK-tree with string elements and a metric like the Damerau–Levenshtein distance can serve as a fuzzy search index.

Example usage

import edu.gatech.gtri.bktree.*;
import edu.gatech.gtri.bktree.BkTreeSearcher.Match;
import java.util.Set;
// The Hamming distance is a simple metric that counts the number
// of positions on which the strings (of equal length) differ.
Metric<String> hammingDistance = new Metric<String>() {
    @Override
    public int distance(String x, String y) {

        if (x.length() != y.length())
            throw new IllegalArgumentException();

        int distance = 0;

        for (int i = 0; i < x.length(); i++)
            if (x.charAt(i) != y.charAt(i))
                distance++;

        return distance;
    }
};

MutableBkTree<String> bkTree = new MutableBkTree<>(hammingDistance);
bkTree.addAll("berets", "carrot", "egrets", "marmot", "packet", "pilots", "purist");

BkTreeSearcher<String> searcher = new BkTreeSearcher<>(bkTree);

Set<Match<? extends String>> matches = searcher.search("parrot", 2);

for (Match<? extends String> match : matches)
    System.out.println(String.format(
        "%s (distance %d)",
        match.getMatch(),
        match.getDistance()
    ));

// Output:
//   marmot (distance 2)
//   carrot (distance 1)

Release artifacts

This project's release artifacts are available in the Maven Central Repository.

<dependency>
  <groupId>edu.gatech.gtri.bk-tree</groupId>
  <artifactId>bk-tree</artifactId>
  <version>1.0</version>
</dependency>