Skip to content

alexandremuzio/cuckooml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuckooml

I am prototyping some ideas for the the Honeynet Project (Google Summer Of Code 2016). The malware analysis reports from cuckoo sandbox I used are from here.

I tried out 2 strategies:

  • Idea 1 from proposal: Behavioral profile creation based on OS operations - used sections behavior/summary and virustotal from the reports
  • Idea 2 from proposal: Behavioral profile creation based on number of API Calls and using Tf-idf - used section behavior/api from the reports

For clustering, I used KMeans and varied the number of clusters from 2 to 7.

Check out the images directory for the preliminary results on the OS Operations approach and images directory for the API approach.

From the initial analysis, it looks like 3/4 clusters produce the most significant results.

Example

An example of the Silhouette Metric applied to KMeans clustering (in 3 clusters) using API calls for feature extraction:

alt text

How to run

Requirementes: python3 scikit-learn, numpy, matlabplotlib

python3 demo.py --method-type <method_type> --data-dir <data_dir_path>

where --method-type can be api_calls or os_operations (default)

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages