JarSift

Setup

This project requires a functioning MariaDB database. Connection details for this database should be provided in a config.properties file, located at the root of the project. It's essential that an empty database exists prior to initiating the process (this can be achieved by running the database initialisation procedure).

The config.properties file should be based on the config.properties.example template found in the project root.

Similarly, rename .env.example file to .env and populate it with the respective values.

Lastly, rename the my-custom.cnf.example file to my-custom.cnf and fill in the appropriate details fitting your environment.

Execution

There are two key processes in the execution of the project: Corpus Creation and Inference.

Corpus Creation

Follow the steps below for the corpus creation:

Used to create the paths file which is used to seed the database:

find /path/to/your/local/.m2/repo \( -name "*.jar" -fprint jar_files.txt \) -o \( -name "*.pom" -fprint pom_files.txt \)

After the paths files have been created, follow the steps below to seed the database:

Run docker compose up db.
Wait for the internal database initialisation to complete.
Once completed, you can terminate the container.
Fill in the PATHS_FILE environment variable in the docker-compose.yml file or the .env file with the path to the jar_files.txt file created earlier.
Proceed by running docker compose up.

It's crucial to follow this sequence. Prematurely running docker compose up may result in the application failing due to an unprepared database connection.

Inference

To execute the inference segment, you need to have a MongoDB instance running which you need to seed with the necessary data. The data can be found in the data directory. To seed the MongoDB database:

# Create the MongoDB container
docker compose up mongodb

# You may use the existing all.zip file, or retrieve the latest data by running the following command (ensure you have gsutil installed)
gsutil cp gs://osv-vulnerabilities/Maven/all.zip .

# preferably in a venv
cd util
pip install -r requirements.txt
python import.py all.zip extracted

When executing the inference segment, ensure:

The corpus database is operational and seeded with the necessary data.
The MongoDB instance is operational and accessible and has been seeded with the necessary data.
Appropriate connection credentials are set in config.properties.

For verification, execute the following command from the project root:

sh run_inference.sh <path_to_jar>

Evaluation

For the evaluation segment, you must ensure that the corpus database is operational and seeded with the necessary data.

To generate the evaluation data, execute the following command from the project root:

sh run_generator.sh <jars per config> <max dependencies per jar>

This will generate the Uber JARs and their respective metadata. This will also run the evaluation process and output the results to the evaluation directory.

If you have already generated the evaluation data and wish to re-run the evaluation process, execute the following command from the project root:

sh run_evaluation.sh <evaluation data directory>

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
script		script
src		src
util		util
www		www
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile-app		Dockerfile-app
Dockerfile-corpus		Dockerfile-corpus
Dockerfile-db		Dockerfile-db
Dockerfile-script		Dockerfile-script
LICENSE		LICENSE
README.md		README.md
compile_java.sh		compile_java.sh
config.properties.example		config.properties.example
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
my-custom.cnf.example		my-custom.cnf.example
pom.xml		pom.xml
run_corpus_generation.sh		run_corpus_generation.sh
run_evaluation.sh		run_evaluation.sh
run_generator.sh		run_generator.sh
run_inference.sh		run_inference.sh
run_inference_to_file.sh		run_inference_to_file.sh

License

Cornul11/JarSift

Folders and files

Latest commit

History

Repository files navigation

JarSift

Setup

Execution

Corpus Creation

Inference

Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages