Skip to content

cosminbasca/rdftools

Repository files navigation

rdftools

rdftools is a python wrapper over a number of RDF related tools

  • rdf parsers / serializers
  • void utilities
  • lubm generator
  • etc

Important Notes

This software is the product of research carried out at the University of Zurich and comes with no warranty whatsoever. Have fun!

TODO's

  • The project is not documented (yet)

How to Compile/Install the Project

Ensure that libraptor2 v2.0.13+ and cityhash are installed on your system (either using the package manager of the OS or compiled from source).

To install rdftools you have two options: 1) manual installation (install requirements first) or 2) automatic with pip

Manual installation:

$ git clone https://github.com/cosminbasca/rdftools
$ cd rdftools
$ python setup.py install

Install the project with pip:

$ pip install https://github.com/cosminbasca/rdftools

Also have a look at the build.sh, clean.sh, test.sh scripts included in the codebase

To include the latest JVM RDF tools update to the latest of jvmrdftools and create an assembly:

$ sbt compile assembly

copy the resulting jar from the target folder to the lib folder inside the rdftools.tools.jvmrdftools module and reinstall the python package.

The tools

To find out what a tool does, simply supply the --help comand line argument to any of the tools Available tools:

  • rdfconvert, convert RDF files from source format to a destination format using the libraptor2 C RDF parser
usage: rdfconvert [-h] [--clear] [--dst_format DST_FORMAT]
                  [--buffer_size BUFFER_SIZE] [--version]
                  SOURCE

rdftools v0.9.2, rdf converter, based on libraptor2

positional arguments:
  SOURCE                the source file or location (of files) to be converted

optional arguments:
  -h, --help            show this help message and exit
  --clear               clear the original files (delete) - this action is
                        permanent, use with caution!
  --dst_format DST_FORMAT
                        the destination format to convert to. Supported
                        parsers: ['rdfxml', 'ntriples', 'turtle', 'trig',
                        'guess', 'rss-tag-soup', 'rdfa', 'nquads', 'grddl'].
                        Supported serializers ['rdfxml', 'rdfxml-abbrev',
                        'turtle', 'ntriples', 'rss-1.0', 'dot', 'html',
                        'json', 'atom', 'nquads'].
  --buffer_size BUFFER_SIZE
                        the buffer size in Mb of the input buffer (the parser
                        will only parse XX Mb at a time)
  --version             the current version
  • rdfconvert2 convert RDF files from source format to a destination format using the rdf2rdf java RDF parser
usage: rdfconvert2 [-h] [--clear] [--dst_format DST_FORMAT]
                   [--workers WORKERS] [--version]
                   SOURCE

rdftools v0.9.2, rdf converter (2), makes use of rdf2rdf bundled - requires
java

positional arguments:
  SOURCE                the source file or location (of files) to be converted

optional arguments:
  -h, --help            show this help message and exit
  --clear               clear the original files (delete) - this action is
                        permanent, use with caution!
  --dst_format DST_FORMAT
                        the destination format to convert to
  --workers WORKERS     the number of workers (default -1 : all cpus)
  --version             the current version
  • rdfencode, endode an ntriples file to a binary format (each S, P, O string is hashed with cityhash 64 bit)
usage: rdfencode [-h] [--version] SOURCE

rdftools v0.9.2, encode the RDF file(s)

positional arguments:
  SOURCE      the source file or location (of files) to be encoded

optional arguments:
  -h, --help  show this help message and exit
  --version   the current version
  • genlubm, generate a LUBM dataset (in parallel)
usage: genlubm [-h] [--univ UNIV] [--index INDEX] [--seed SEED]
               [--ontology ONTOLOGY] [--workers WORKERS] [--version]
               OUTPUT

rdftools v0.9.2, lubm dataset generator wrapper (bundled) - requires java

positional arguments:
  OUTPUT               the location in which to save the generated
                       distributions

optional arguments:
  -h, --help           show this help message and exit
  --univ UNIV          number of universities to generate
  --index INDEX        start university
  --seed SEED          the seed
  --ontology ONTOLOGY  the lubm ontology
  --workers WORKERS    the number of workers (default -1 : all cpus)
  --version            the current version
  • genlubmdistro generate a LUBM dataset (in parallel) and mix the universities to N sites with the specified distribution
usage: genlubmdistro [-h] [--distro DISTRO] [--univ UNIV] [--index INDEX]
                     [--seed SEED] [--ontology ONTOLOGY] [--pdist PDIST]
                     [--sites SITES] [--clean] [--workers WORKERS] [--version]
                     OUTPUT

rdftools v0.9.4, lubm dataset generator wrapper (bundled) - requires java

positional arguments:
  OUTPUT               the location in which to save the generated
                       distributions

optional arguments:
  -h, --help           show this help message and exit
  --distro DISTRO      the distibution to use, valid values are ['seedprop',
                       'uni2many', 'horizontal', 'uni2one']
  --univ UNIV          number of universities to generate
  --index INDEX        start university
  --seed SEED          the seed
  --ontology ONTOLOGY  the lubm ontology
  --pdist PDIST        the probabilities used for the uni2many distribution,
                       valid choices are ['3S', '7S', '5S'] or file with
                       probabilities split by line
  --sites SITES        the number of sites
  --clean              delete the generated universities
  --workers WORKERS    the number of workers (default -1 : all cpus)
  --version            the current version
  • genvoid, generate VoID statistics from the source file
usage: genvoid [-h] [--version] SOURCE

rdftools v0.9.2, generate void statistics for RDF source file

positional arguments:
  SOURCE      the source file to be analized

optional arguments:
  -h, --help  show this help message and exit
  --version   the current version
  • genvoid2, generate VoID statistics from the RDF source file, using the nxparser VoID exporter
usage: genvoid2 [-h] [--dataset_id DATASET_ID] [--use_nx] [--version] SOURCE

rdftools v0.9.2, generate a VoiD descriptor using the nxparser java package

positional arguments:
  SOURCE                the source file to be analized

optional arguments:
  -h, --help            show this help message and exit
  --dataset_id DATASET_ID
                        dataset id
  --use_nx              if true (default false) use the nx parser builtin void
                        generator
  --version             the current version
  • ntround, round all numeric literals (typed or untyped) in an ntriples files with the given precision
usage: ntround [-h] [--prefix PREFIX] [--precision PRECISION] [--version] PATH

rdftools v0.9.2, rounds ntriple files in a folder, (rounds the floating point literals)

positional arguments:
  PATH                  location of the indexes

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX       the prefix used for files that are transformed, cannot
                        be the enpty string!
  --precision PRECISION
                        the precision to round to, if 0, floating point
                        numbers are rounded to long
  --version             the current version

Thanks a lot to

About

simple collection of python RDF tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published