xlr8

Fast cosine similarity for Python

Installing the package

Clone the repository.
Run pip install -e . inside the local repository.

Optional installation

If you wish to leverage xlr8's further speedup on large matrix multiplications, you may install the following:

First, sparse_dot via pip install sparse-dot-mkl.
Then, Intel MKL via conda install -c intel mkl.

If a warning pops up stating that your Intel MKL version is outdated, find the old .so or .dll file to something else, then rename the updated .so or .dll to the appropriate filename. This is probably caused by your system detecting the old .so or .dll. If the problem persists, try pip install mkl.

Usage

Using the cosine similarity function is straightforward.

from xlr8.similarity import cosine_similarity
import numpy as np

A = np.random.rand(1000,1000)
B = np.random.rand(1000,1000)

cosine_similarity(A, B)

Benchmarking

To benchmark xlr8's different modes for performing cosine similarity, run python tests/benchmark.py <dimension size> from the main directory of the repository.

Here's an example of running the benchmark with matrices A and B set to sizes of 10,000 x 10,000:

$ python tests/benchmark.py 10000
scikit-learn cosine similarity speed in seconds: 14.492997799999998
xlr8 default cosine similarity speed in seconds: 15.422745399999997
xlr8 float cosine similarity speed in seconds: 9.0765971
xlr8 approximated cosine similarity speed in seconds: 16.5568625
xlr8 approximated float cosine similarity speed in seconds: 8.802123799999997

Usage in a natural language processing task

You can also test the library on a document similarity task. It is recommended to use scikit-learn's cosine_similarity for smaller number of documents.

$ python document_similarity.py
xlr8 (Intel MKL) document similarity speed in seconds: 0.06536109999999873
scikit-learn document similarity speed in seconds: 0.1037038999999993
xlr8 (default BLAS) document similarity speed in seconds: 13.5424935
Did scikit-learn and xlr8 find the same 'most similar document'? True

Approximation

This repository implements the uniform approximate matrix multiplication method found in this paper by Drineas, Kannan, and Mahoney [1].

[1] Drineas, P., Kannan, R., & Mahoney, M. W. (2006). Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal on Computing, 36(1), 132-157.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
tests		tests
xlr8		xlr8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

xlr8

xlr8

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

xlr8

Installing the package

Optional installation

Usage

Benchmarking

Usage in a natural language processing task

Approximation

About

Releases 1

Packages

Languages

License

Ethereal-AI/xlr8

Folders and files

Latest commit

History

Repository files navigation

xlr8

Installing the package

Optional installation

Usage

Benchmarking

Usage in a natural language processing task

Approximation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages