Skip to content

A largely incomplete but hopefully useful list of links to datasets for relational learning and inductive logic programming. No guarantees on availability.

Notifications You must be signed in to change notification settings

joschout/RelationalDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Relational Datasets

A largely incomplete but hopefully useful list of links to datasets for relational learning and inductive logic programming. No guarantees on availability.

Classic ILP datasets

A list of datasets per source.

  • The CVUT Prague Relational Dataset Repository: A large collection of ILP datasets, stored as MariaDB (SQL) datasets.

    Motl, Jan, and Oliver Schulte. "The CTU prague relational learning repository." arXiv preprint arXiv:1511.03086 (2015).

  • ACE data mining system data sets: nine ILP datasets in Quinlan's FOIL format, together with scripts to convert them into ACE format (see README.txt in the ZIP). These were used in:

    Jan Struyf, Jesse Davis and David Page, An efficient approximation to lookahead in relational learners. In J. Fürnkranz, T. Scheffer and M. Spiliopoulou, editors, Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Proceedings. Lecture Notes in Artificial Intelligence, volume 4212, pages 775-782, Springer, 2006, [Abstract], [BibTeX].

    • Muta188
    • Muta230
    • Financial
    • Sisyphus A
    • Sisyphus B
    • UWCSE
    • Yeast
    • Carcinogenesis
    • Bongard
  • Alchemy

    • Animals
    • CiteSeer
    • Cora
    • Epinions
    • IMDB
    • Kinships
    • Nations
    • Protein Interaction
    • Radish Robot Mapping - Tutorial
    • UMLS
    • UW-CSE
    • WebKB
  • ILP Datasets:: in SQL format

    • Carcinogenesis
    • Financial
    • Trains
    • Mutagenesis
    • Imdb
    • IMDB Top/Botttom Movies
  • Stephen Muggleton's data set directory:

    • Trains
    • alzheimers
    • carcinogenesis
    • chess
    • e_coli
    • mesh
    • more_chess
    • mutagenesis
    • proteins
    • satellite
    • suramin
    • utube
  • Sriraam's StARLinGLAB data sets:

    • Toy Father
    • Toy Cancer
    • IMDB
    • Cora
    • UW-CSE
    • WebKB
    • CiteSeer
    • Boston Housing
    • Drug-Drug Interactions
  • GILPS:

    • alzheimers
    • carcinogenesis
    • dsstox
    • metabolism
    • mutagenesis
    • pyrimidines
    • trains
  • BayesBase: Datasets posted in 3 formats: (i) as a MySQL dump for a relational schema, (ii) in the WILL format, similar to the Aleph ILP input format, (iii) in the .db format of Markov Logic Networks as implemented in the Alchemy system.

    • unielwin
    • Mutagenesis_std
    • MovieLens_std
    • MovieLens_TQ(1M)
    • Financial_std
    • Mondial_std
    • UW_std
    • imdb_MovieLens
    • Hepatitis_std
    • Cont_PLG_TM (Continuous database)
  • LINQS - Statistical Relational Learning Group

    • Social Spammer
    • Drug-Target Interaction
    • Stance Classification
    • CiteSeer for Document Classification
    • CiteSeer for Entity Resolution
    • Cora
    • ArXiv
    • PubMed Diabetes
    • WebKB
    • Terrorists
    • Terrorist Attacks
  • klog Datasets as Prolog files:

    • WebKB: Originally developed by M. Craven et al. (1998). The version available here is a direct conversion to Prolog of the data available at the Alchemy website.
    • Internet Movie Database: Data extracted from this database has been used in a number of relational learning papers. The version available here was downloaded from the IMDb website, converted into SQL using the prodecure described in http://imdbpy.sourceforge.net/docs/README.sqldb.txt and finally a subset of the tuples was converted into a Prolog file.
    • UW-CSE The data set originally developed at University of Washington for demonstrating the capabilities of Markov logic networks. The version available here is a direct conversion to Prolog of the data available at theAlchemy website.
    • Bursi This data set contains 4,337 molecules labeled according to mutagenicity (2,401 mutagens and 1,936 nonmutagens). Originally developed by Kazius et al (2005) it has been used in a number of machine learning papers, especially those studying graph kernels.
    • Biodegradability This is an older data set of chemical structures containing 328 compounds labeled by their half-life for aerobic aqueous biodegradation (a regression task).
  • Weka Proper - RELAGGS

  • MLnet
    Among others, some ILP datasets. Note: Internet Archive's Wayback machine link

Other links:

About

A largely incomplete but hopefully useful list of links to datasets for relational learning and inductive logic programming. No guarantees on availability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published