Skip to content

BenJoyenConseil/rmi

Repository files navigation

RMI

Go PkgGoDev codecov

A goland implementation of a RMI (Recursive Model Indexes), a Learned Index structure based on the research work by Kraska & al.

Fig 1 from the Case for Learned Index Structures

usage

Create an index and make lookups

// load the age column and parse values into float64 values
ageColumn := extractAgeColumn("data/people.csv")

// create an index over the age column
index := index.New(ageColumn)

// search an age and get back its line position inside the file people.csv
search, _ := strconv.ParseFloat(os.Args[1], 64)
lines, _ := index.Lookup(search)

the main.go file contains an example of a learned index overdata/people.csv age column.

It outputs :

$ cat data/people.csv
name,age,sex
jeanne,90,F
jean,23,M
Carlos,3,M
Carlotta,45,F
Miguel,1,M
Martine,1.5,F
Georgette,23,F

$ go run main.go 23
2020/11/15 20:29:56 People who are 23 years old are located at [8 3] inside data/people.csv 

This is the plot showing the approximation (the linear regression), the cumulative distribution function for each value, and the current age's value (the Keys of the index) :

Fig 2 the LearnedIndex over people.csv

features

  • A simple linear regression model learning the CDF of a float64 array
  • A learned index structure fitted on keys of a collection
  • Finding rows id on a CSV file
  • Return a list of lines matching the key
  • Use max + min error bounding elements to search quickly
  • Benchmarks InMemory LearnedIndex against InMem BinarySearch
  • Store offset lines and a primary key index
  • Store the sortedTable
  • CLI to create indexes over CSV
  • Benchmarks Learned against BinarySearchTree
  • A two layer recursive index
  • Learn on integer
  • Index is persistent and durable (on hard drive)
  • A sort algorythm using learned structure
  • Learning on string type ?

related works