This library is an implementation of a Code Property Graph as seen in the paper published by Fabian Yamaguchi on Modeling and Discovering Vulnerabilities with Code Property Graphs
A code property graph is a highly efficient data structure designed to mine large codebases for similar programming patterns. The data structure can be loaded into a graph database where properties of code can be queried. Code property graphs are intended to be code-agnostic and highly scalable making it one of the best choices for code representation.
Requires:
python==3.9.12
pip3
pip install codepropertygraph
from codepropertygraph import CPG
PATH = 'C:\Users\ExampleUser\Projects\portfolio'
code_cpg = CPG(PATH)
print(code_cpg.files.count)
> 1
- Download Neo4J Desktop v4.4.5 to create local graph databases locally and remotely from your desktop. If the latest version has changed, use this link to download the version used for development.
- Create a new project and a new local graph database as shown below. It might take a few moments to finish loading.
- Start the database. Make sure the DB is active before heading onto the Installation and Running the application sections.
Starting the database | Active Database |
---|---|
If you would like to use OrientDB, here are the instructions.
To install the repository, you need to clone it and run it inside a virtual environment. Running main.py
generates a Code Property Graph of the simple addition script inside examples/
and saves it to output/
.
git clone https://github.com/markgacoka/codepropertygraph.git
cd codepropertygraph
conda create --name codepropertygraph python=3.8
conda activate codepropertygraph
pip install -r requirements.txt
python main.py
Note: Tested only on Windows 10, 11.
pytest tests
-- OR --
python tests/test_example.py
For first time contributors, read the CONTRIBUTING page.