DataAnonymiser

This software anonymises data inside text files and CSV-like files. It removes various sorts of personally identifiable information. Each removed part is replaced with a suitable generic text, depending on the type of removed data.

Currently English and Russian languages are supported. Russian works both with Cyrillic and Latin characters.

The language is automatically detected. In case of CSV-like files, the language of each cell is detected separately. Therefore multi-language CSV-like files are supported as well.

Example input and output files

Example input and output copied to an annotated PDF file: Anonymisation example 1.pdf

Example input and output file pairs for TXT and CSV file formats in English language, and TXT file format in Russian language with Cyrillic and Latin alphabet:

How it works

This software uses a combination of Named Entity Recognition (NER) and regular expressions to perform its function.

Usage

The configuration options can be found in the file Anonymiser.ini

python Recogniser.py "input_file.txt"|"input_file.csv"|"input_file.tsv" ["output_file.txt"|"output_file.csv"|"output_file.tsv"]

The user provided files are expected to be in the same folder as the main Python script, unless an absolute path is provided. If run without arguments then sample files in the data folder are used. If the user provides input file name but no output file name then the output file name will be calculated as input filename + _anonymised + .input filename extension.

Current project state

Ready to use. Is actively developed further.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
icons		icons
.gitignore		.gitignore
Anonymisation example 1.pdf		Anonymisation example 1.pdf
Anonymise.py		Anonymise.py
Anonymiser.ini		Anonymiser.ini
Anonymiser.py		Anonymiser.py
LICENSE		LICENSE
Logger.py		Logger.py
README.md		README.md
Utilities.py		Utilities.py
install_steps.txt		install_steps.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

icons

icons

.gitignore

.gitignore

Anonymisation example 1.pdf

Anonymisation example 1.pdf

Anonymise.py

Anonymise.py

Anonymiser.ini

Anonymiser.ini

Anonymiser.py

Anonymiser.py

LICENSE

LICENSE

Logger.py

Logger.py

README.md

README.md

Utilities.py

Utilities.py

install_steps.txt

install_steps.txt

requirements.txt

requirements.txt

Repository files navigation

DataAnonymiser

Example input and output files

How it works

Usage

Current project state

About

Releases

Packages

Languages

License

levitation-opensource/DataAnonymiser

Folders and files

Latest commit

History

Repository files navigation

DataAnonymiser

Example input and output files

How it works

Usage

Current project state

About

Topics

Resources

License

Stars

Watchers

Forks

Languages