cuckooml

I am prototyping some ideas for the the Honeynet Project (Google Summer Of Code 2016). The malware analysis reports from cuckoo sandbox I used are from here.

I tried out 2 strategies:

Idea 1 from proposal: Behavioral profile creation based on OS operations - used sections behavior/summary and virustotal from the reports
Idea 2 from proposal: Behavioral profile creation based on number of API Calls and using Tf-idf - used section behavior/api from the reports

For clustering, I used KMeans and varied the number of clusters from 2 to 7.

Check out the images directory for the preliminary results on the OS Operations approach and images directory for the API approach.

From the initial analysis, it looks like 3/4 clusters produce the most significant results.

Example

An example of the Silhouette Metric applied to KMeans clustering (in 3 clusters) using API calls for feature extraction:

How to run

Requirementes: python3 scikit-learn, numpy, matlabplotlib

python3 demo.py --method-type <method_type> --data-dir <data_dir_path>

where --method-type can be api_calls or os_operations (default)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
.gitignore		.gitignore
README.md		README.md
cluster.py		cluster.py
core.py		core.py
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

README.md

README.md

cluster.py

cluster.py

core.py

core.py

demo.py

demo.py

Repository files navigation

cuckooml

Example

How to run

About

Releases

Packages

Languages

alexandremuzio/cuckooml

Folders and files

Latest commit

History

Repository files navigation

cuckooml

Example

How to run

About

Topics

Resources

Stars

Watchers

Forks

Languages