Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "fuzzy" record linking with column aliases or URIs #25

Open
tmpks opened this issue Nov 24, 2020 · 0 comments
Open

Support for "fuzzy" record linking with column aliases or URIs #25

tmpks opened this issue Nov 24, 2020 · 0 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@tmpks
Copy link
Contributor

tmpks commented Nov 24, 2020

Re this issue in the metadata_schema repo:
@RobinL "Extending the schema to allow 'fuzzy' relationships between disparate tables - standard names": moj-analytical-services/metadata_schema#2

I would like to be able to automatically detect fields that may be useful for fuzzy record linking.

For instance, we may be able to join two datasets from completely different source databases on fields like dob, first_name etc.

However, in general these fields will have different names.

This may involve

  • Extending the schema to allow such fields to be identified
  • Agreeing on a standard alias for these fields so that the column name in the dataset can be translated into the standardised version e.g. dob and birthdate may be standardised to date_of_birth

We've added an alias property for columns, so if the column has a non-standard name, the alias can provide the standard name. But we don't yet know who might maintain a definitive list of those, and where.

An idea in the readme we haven't done anything with yet, that might (?) facilitate linking is specifying some kind of URI for columns, like <repo>/metadata/<folder>/<table_name>/<column_name> or <databasename>:<tablename>:<columnname>

@isichei isichei added question Further information is requested enhancement New feature or request labels Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants