GitHub - hajarmerbouh/Cybersecurity-Knowledge-graph: The repository consists of : Construction of the corpus, Named Entity Recognition , Relationship Extraction , Construction of knowledge graph using py2neo , Analysis on the constructed KG with cypher commands

The repository contains the following tasks : Construction of the corpus, Named Entity Recognition , Relationship Extraction , Construction of knowledge graph using py2neo , Analysis on the constructed KG with cypher query

Conceptual model of the knowledge graph :

Overall design of the solution :

In general, it is essential to build beforehand an idea on the usefulness and the types of information that we need to access at the level of our knowledge graph, given that the field of cybersecurity is wide. This will make it easier for us to choose the documents on which we will base the creation of the graph.

First of all, we will have to build the corpus: we can either base ourselves on a single collection of reports bearing the same type of information, or expand the field of knowledge by adding other types of information, from various sources, deemed to be useful during the analysis, while ensuring the homogeneity of the corpus built. Before starting the step of extracting the entities, it will have to carry out a preprocessing on the documents contained in the corpus in order to prepare the data or the data set that will be suitable for the application of the REN approach.

By trying different models and different types of architecture, we will have a better chance of obtaining an efficient model that allows the extraction of entities. After obtaining the entities, we will need to know the context in which they are located. In order to contextualize the entities, we will first have to prepare the data so that it is adequate for the task of extracting the relationships and then to train a model that can predict the existing relationships between the entities as accurately as possible.

Once we succeed in having a model that performs well in recognizing named entities and another model that performs well in extracting relationships. We can extract the triples from a collection of reports: entities and relations that contextualize them, and then build our knowledge graph.

Since the objective is not only based on the construction of the graph but rather on its usefulness rather on its usefulness and efficiency in countering a cyber-attack. An analysis step is added towards the end, which consists of analyzing and querying the knowledge graph constructed. The figure above illustrates the approach followed.

Detailed design of the solution :

Model Performance for the NER task

Model Performance for the Relation-Extraction task

Small Overview of the constructed graph

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data processing		Data processing
KG analysis		KG analysis
KG construction		KG construction
NER		NER
Relation extraction		Relation extraction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data processing

Data processing

KG analysis

KG analysis

KG construction

KG construction

NER

NER

Relation extraction

Relation extraction

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

hajarmerbouh/Cybersecurity-Knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages