SecurityDataScience

Apache Spark 2.2.0 (pyspakr) exercises of Security Data Science Course. The exercises of this course are based on KDD 99 competition

KDD 99 Competition

The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between "bad" connections, called intrusions or attacks, and "good" normal connections

This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

You can find more information in http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

The NSL-KDD Data Set

The data set used in the course is not the original KDD 99 competition but NSL-KDD data set.

The NSL-KDD data set has the following advantages over the original KDD data set:

It does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records.
There is no duplicate records in the proposed test sets; therefore, the performance of the learners are not biased by the methods which have better detection rates on the frequent records.
The number of selected records from each difficulty level group is inversely proportional to the percentage of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a wider range, which makes it more efficient to have an accurate evaluation of different learning techniques.
The number of records in the train and test sets are reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable.

How to run the exercises

The course environment is based in Docker. Just go to the folder where you download the project and run (it takes several minutes):

on MacOs and Linux:

docker run -it --rm -p 8888:8888 -v “$(pwd)”:/home/jovyan/work jupyter/all-spark-notebook:4ebeb1f2d154

on Windows:

docker run -it --rm -p 8888:8888 -v ${PWD}:/home/jovyan/work jupyter/all-spark-notebook:4ebeb1f2d154

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/KDD		data/KDD
.gitignore		.gitignore
05_KDD99_DataExploration.ipynb		05_KDD99_DataExploration.ipynb
06_KDD99_UnsupervisedLearning.ipynb		06_KDD99_UnsupervisedLearning.ipynb
07_KDD99_SupervisedLearning-I.ipynb		07_KDD99_SupervisedLearning-I.ipynb
08_KDD99_SupervisedLearning-II.ipynb		08_KDD99_SupervisedLearning-II.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/KDD

data/KDD

.gitignore

.gitignore

05_KDD99_DataExploration.ipynb

05_KDD99_DataExploration.ipynb

06_KDD99_UnsupervisedLearning.ipynb

06_KDD99_UnsupervisedLearning.ipynb

07_KDD99_SupervisedLearning-I.ipynb

07_KDD99_SupervisedLearning-I.ipynb

08_KDD99_SupervisedLearning-II.ipynb

08_KDD99_SupervisedLearning-II.ipynb

README.md

README.md

Repository files navigation

SecurityDataScience

KDD 99 Competition

The NSL-KDD Data Set

How to run the exercises

on MacOs and Linux:

on Windows:

About

Releases

Packages

Languages

jbaquerot/SecurityDataScience

Folders and files

Latest commit

History

Repository files navigation

SecurityDataScience

KDD 99 Competition

The NSL-KDD Data Set

How to run the exercises

on MacOs and Linux:

on Windows:

About

Topics

Resources

Stars

Watchers

Forks

Languages