Skip to content
This repository has been archived by the owner on Feb 20, 2022. It is now read-only.
/ semnet Public archive

SemNet: a tool for building semantic networks from web pages

License

Notifications You must be signed in to change notification settings

rsmeral/semnet

Repository files navigation

SemNet

This project is not maintained.


A modular and extensible framework for building domain-specific semantic networks. The main purpose is automated collection of unstructured or semi-structured data from web resources and their transformation into a machine readable representation – a semantic network.

SemNet is the name for a group of processors for the PipedObjectProcessor, the underlying data processing framework. These processors were designed to serve the purpose of SemNet – crawling the web (HTMLCrawler), extracting information (scrapers), optionally mapping terms between vocabularies (StatementMapper) and persisting the information (SesameWriter).

Piped Object Processor

Piped object processor (POP) is the name given to the lowest layer of the implementation. It is a construct inspired by the design pattern called Chain of Responsibility. The POP is based on the notion of processing chains where information flows from the input to the output, passing through arbitrary number of object processors, each of which might perform some transformation on the received information or emit new pieces of information based on those received. Only discrete pieces of information are exchanged, not continuous data streams. Information is encapsulated in containers called simply objects, since POP is based on Java, where the top-level element in type hierarchy is Object. Any Java class may serve as a container.

ArtNet

ArtNet is a semantic network of works of art created using SemNet. It contains data collected from ČSFD.cz and DatabazeKnih.cz during may 2011, in the extent of

  • 244 000 movies,
  • 58 000 actors/directors,
  • 73 000 books,
  • 23 000 literary authors,

and millions of relationships. Entities in ArtNet are instances of WordNet "classes" (synsets).


Created in 2011 as a bachelor's thesis at FI MUNI.

About

SemNet: a tool for building semantic networks from web pages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages