The SQL/Ibis powered sklearn of record linkage
-
Updated
May 30, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
The SQL/Ibis powered sklearn of record linkage
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.
Example scripts for generating data with Gecko
Backend (Docker & API) for matchID project
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
Spark RDD with Lucene's query and entity linkage capabilities
Supplementary code for "Class ratio and its implications for reproducibility and performance in record linkage" presented at The Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024.
Interpretable metadata for the results of NHS England record linkage
An exploration of generalizable approaches to unsupervised entity matching for use in linking tabular public energy data sources.
🔎 Finds fuzzy matches between datasets
🔎 Finds fuzzy matches between CSV files
Fast, accurate, open-source geocoding in Python
Python library for the generation and mutation of realistic personal identification data at scale
LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn
Record linkage - simple, flexible, efficient.
🕸️ Little helper for handling entity clusters
Created by Halbert L. Dunn
Released 1946