Data Pipeline

This repository contains a data pipeline for processing medical imaging data. It includes modules for anonymizing DICOM files, encrypting patient IDs, extracting metadata, and processing the data. Additionally, the data pipeline offers flexibility and extensibility, allowing users to customize and expand its functionality according to specific project requirements. With a focus on scalability and performance optimization, the pipeline is capable of handling large volumes of medical imaging data efficiently. Its modular design fosters modularity and code reusability, promoting ease of maintenance and future enhancements.

Below are the key functionalities encapsulated within the pipeline:

Anonymization Module: This module is responsible for anonymizing DICOM files, ensuring the removal of sensitive patient-related information while adhering to regulatory compliance standards. It sanitizes the data by eliminating identifiable attributes, thereby safeguarding patient privacy.
Encryption Module: The encryption module adds an extra layer of security by encrypting patient IDs, thus enhancing data protection measures. By encrypting sensitive identifiers, the module ensures that patient information remains confidential and inaccessible to unauthorized parties.
Metadata Extraction: This module facilitates the extraction of metadata from DICOM files, enabling users to access valuable information embedded within the imaging data. It parses the DICOM headers to retrieve essential metadata attributes, providing insights into the imaging parameters and acquisition details.
Data Processing: The data processing module orchestrates the sequential execution of various operations, including preprocessing, analysis, and transformation of medical imaging data. It streamlines the processing pipeline, enabling seamless integration of diverse data processing tasks.

Encompassing these modules, the data pipeline provides a robust framework for effectively managing medical imaging data. Whether it involves anonymizing patient information, encrypting identifiers, extracting metadata, or processing imaging data, the pipeline offers a versatile solution tailored to meet the intricate demands of medical and biomedical imaging workflows (10.1007/s10278-021-00522-6). With its modular architecture, the pipeline facilitates seamless integration into existing healthcare systems and can be customized to accommodate specific use cases and requirements.

Modules

anonymizer.py: Module for anonymizing DICOM files by removing patient-related information and renaming them according to a specified format.
encryption.py: Module for encrypting patient IDs.
extractor.py: Module for extracting metadata from DICOM files.
main.py: Main script for executing the data processing pipeline.
processor.py: Module for processing medical imaging data.

Usage

To use the data pipeline, follow these steps:

Clone the repository:

git clone https://github.com/MIMBCD-UI/data-pipeline.git

Install the required dependencies by creating a virtual environment and installing the packages listed in requirements.txt:

cd data-pipeline
pip install -r requirements.txt

Run the main script to execute the data processing pipeline:

python main.py

Contributing

Contributions are welcome! If you'd like to contribute to this project, please fork the repository and submit a pull request with your proposed changes.

License

This project is licensed under the MIT License.

Team

Our team brings everything together sharing ideas and the same purpose, developing even better work. In this section, we will nominate the full list of important people for this repository, as well as respective links.

Authors

Francisco Maria Calisto [ Academic Website | ResearchGate | GitHub | Twitter | LinkedIn ]
Diogo Araújo
Carlos Santiago [ ResearchGate ]
Catarina Barata
Jacinto C. Nascimento [ ResearchGate ]

Promoters

João Fernandes [ ResearchGate ]
Margarida Morais [ ResearchGate ]
João Maria Abrantes [ ResearchGate ]
Nuno Nunes [ ResearchGate ]

Companions

Hugo Lencastre
Nádia Mourão
Miguel Bastos
Pedro Diogo
João Bernardo
Madalena Pedreira
Mauro Machado
Bruno Dias
Bruno Oliveira
Luís Ribeiro Gomes

Acknowledgements

This work was partially supported by national funds by FCT through both UID/EEA/50009/2013 and LARSyS - FCT Project 2022.04485.PTDC (MIA-BREAST) projects hosted by IST, as well as both BL89/2017-IST-ID and PD/BD/150629/2020 grants. We are indebted to those who gave their time and expertise to evaluate our work, who among others are giving us crucial information for the BreastScreening project.

Supporting

Our organization is a non-profit organization. However, we have many needs across our activity. From infrastructure to service needs, we need some time and contribution, as well as help, to support our team and projects.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
analysis		analysis
assets		assets
figures		figures
scripts		scripts
src		src
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

MIMBCD-UI/data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline

Modules

Usage

Contributing

License

Team

Authors

Promoters

Companions

Acknowledgements

Supporting

Contributors

Backers

Sponsors

Departments

Laboratories

Domain

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages