Skip to content

chatid/lua-ahocorasick

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lua-ahocorasick

This is a lua binding to libahocorasick, part of MultiFast

The Aho-Corasick algorithm allows for fast and efficient string matching against large sets of strings.

Usage

new_automata = require "lua-ahocorasick".new

First, an automata must be built up. You need to add each word you want to search for to the automata.

my_automata = new_automata()

my_automata:add("some")
my_automata:add("strings")
my_automata:add("to")
my_automata:add("search")
my_automata:add("for")

Once the automata has been finalised, it can be used:

my_automata:finalize()

You can inspect the automata (prints to stdout):

my_automata:display()

Finding a needle in a haystack

Without a callback, the result is a Boolean indicating if a match was found.

found = my_automata:search("a string with a word to find")

Getting the matches

If called with a callback, it will be called for each match. Return true from your callback if you want to stop searching.

local str = "A long string with some words in it to find."
did_break = my_automata:search(str, function(start_pos, end_pos, n_matches)
	print("match found at positions "..start_pos.." through "..end_pos..": "..str:sub(start_pos, end_pos))
	return nil -- Continue searching
end, false)
match found at positions 20 through 23: some
match found at positions 37 through 38: to

Streaming input

If you pass true as the next argument, the search will continue from where it left off.

print("A")
my_automata:search("a series of stri", print)
print("B")
my_automata:search("ngs broken over multi", print, true)
print("C")
my_automata:search("ple packets in which t", print, true)
print("D")
my_automata:search("o search.", print, true)
print("E")
A
B
13	19	1
C
D
59	60	1
62	67	1
E

About

Binding to multifast's multiple string search library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published