Skip to content

Automated Knowledge Graph construction for protein-protein interaction networks and metadata

License

Notifications You must be signed in to change notification settings

chrisammon3000/bioNX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn

bioNX: Automated Knowledge Graph construction of PPI networks

Automate the construction of a Knowledge Graph containing interactions for any given gene.

About the Project

The exponential accumulation of biological data presents a formidable challenge when it comes to integration of new knowledge leading to actionable insights. The bioNX project employs automated Knowledge Graph creation of protein-protein interaction networks using Neo4j as a way to demonstrate how such integration can be done.

Using a graph database makes it possible to explore the context and relationships in the data using various algorithms:

  • Community detection
  • Centrality to measure the importance of a node
  • Prediction of properties based on similarity
  • Prediction of undiscovered relationships
  • Finding the shortest paths between nodes

-- Project Status: [Active]

bioNX is a work in progress.

Current Data Sources

  • bioGRID - primary data source for PPIs
  • HGNC - Gene nomenclature reference
  • PubMed - Literature
  • Uniprot - Protein properties (pending implementation)
  • Entrez - Gene properties (pending implementation)
  • GO - Gene properties (pending implementation)

Prerequisites

  1. BioGRID Access Key
  2. Neo4j

Getting Started

1. Installation

Clone repo:

git clone https://github.com/abk7777/bioNX

Install Python libraries:

cd bioNX
pip install -r requirements.txt

Update the .env file with the correct values:

BIOGRID_ACCESS_KEY=<BIOGRID_ACCESS_KEY>
NEO4J_USERNAME=<NEO4J_USERNAME>
NEO4J_PASSWORD=<NEO4J_PASSWORD>
NEO4J_BOLT_URL=bolt://localhost:7687
NEO4J_HOME=<NEO4J_HOME>

Make the data directory:

cd bioNX
mkdir -p data/clean/

2. Run Jupyter Notebook

Start Jupyter Notebook:

cd notebooks/ && \
jupyter notebook

Open the notebook 0.1-biogrid-data.ipynb and run its contents. This will output a file named biogrid_ppi_data.csv to the import directory in the $NEO4J_HOME folder and place a copy of it in the data/clean/ directory for easy access.

To specify a gene, update the gene parameter under the section Select Gene. Take note that API requests are throttled to 10 per second, which means that it is wise to limit the results using the limit parameter so it doesn't take forever to fetch the data.

3. Load the graph in Neo4j

The simplest way to load the graph into Neo4j is to copy and paste the neo4j/load.cyp script into Neo4j and run it.

Usage

Example Cypher query returning genes, interactions, and author for MTHFR gene mentioned in PubMed article "26186194":

MATCH (gene1:Gene { name: 'MTHFR' })-[:INTERACTS_WITH]-(gene2:Gene),
(gene1)-[:MENTIONED_IN]->(article:Article { pubmed_id:"26186194" })<-[:MENTIONED_IN]-(gene2), 
(article)<-[:PUBLISHED]-(author:Author), 
(gene1)-[:INTERACTOR_IN]->(interaction:Interaction)<-[:INTERACTOR_IN]-(gene2)
RETURN gene1, gene2, author, article, interaction;

MTHFR Graph in Neo4j

Roadmap

Current Functionality

Running load.cyp in Neo4j will produce a graph containing the following schema:

  • (Gene)-[:INTERACTOR_IN]->(Interaction)
  • (Gene)-[:INTERACTS_WITH]-(Gene)
  • (Interaction)-[:MENTIONED_IN]->(Article)
  • (Gene)-[:MENTIONED_IN]->(Article)
  • (Author)-[:PUBLISHED]->(Article)

Neo4j Screenshot

Future Implementations

See the open issues for a list of proposed features (and known issues). The current planned implementation includes:

  • Expand graph schema with nodes for:
    • Protein complexes
    • Cofactors
    • RNAs
    • KEGG Pathways
    • Post-Translation Modifications
    • Chromosome loci
    • Subcellular location
    • Tissue
    • Organ
    • Disease Condition

Please feel free to include suggestions for things like:

  • Nodes, relationships and properties
  • Data sources
  • Functionality and features
  • Bug fixes

Built With

Contributing

While the project is still getting off the ground please feel free to start a discussion in the open issues.

Contact

Gregory Lindsey - @abk7x4 - gclindsey@gmail.com

Project Link: https://github.com/abk7777/bioNX

About

Automated Knowledge Graph construction for protein-protein interaction networks and metadata

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published