see Jabberwocky site for in-depth explanation and working scenarios (including test files)
Jabberwocky is a toolkit for ontologies. Since we all know ontologies are "nonsense". Not enough tools existsing utilise the power of ontologies. Don't hesitate to create an issue
or pull request
(see guidelines first).
See setup.py
in your local copy for version number | or Releases
:
- v1.0.0.0 [29/06/2020]
- v2.0.0.0 [10/05/2021]
- includes
spacy PhraseMatcher
- own synonym tags
- plot output for tf-idf
- includes
$ git clone https://github.com/sap218/jabberwocky
$ cd jabberwocky
$ python3 setup.py install --user
note: if you are using a virtual environment you can avoid --user
$ pip3 install click BeautifulSoup4 scikit-learn pandas lxml pytest spacy matplotlib
or after installing, use the requirements.txt
file:
$ pip3 install -r requirements.txt
command | description |
---|---|
bandersnatch |
extract synonyms from an RDF/XML syntax OWL ontology |
catch |
extract elements / sentences of text using key words |
bite |
run statistical tf-idf for important words from text |
arise |
adding / updating new synonyms to an ontology |
jabberwocky
works with the OWL
ontology format: RDF/XML
- for example, well-known biomedical ontologies such as doid.owl
, hpo.owl
, and uberon.owl
will all work, plus your own created.
for examples of Jabberwocky's commands in use, please see the site.
OR to run the automated tests (in the cloned directory):
$ git submodule init
$ git submodule update
$ tox
bandersnatch
curates synonyms for a list of key terms / or words of interest from an ontology of your choice, you provide a list of ontology synonym tags. note: it is recommended your list of keywords are exactly the classes from your chosen ontology (all in lowercase).
$ jab-bandersnatch -o hpo.owl -s ontology_synonym_tags.txt -k words_of_interest.txt
catch
essentially "catches" key elements / sentences from textual data using a .json
of key terms and their synonyms, you can use the outcome from bandersnatch
. A user will also provide a .txt
or .json
of the text data. note: if a .json
of text data is provided, you need specify the parameter for the field that contains the textual data to process.
$ jab-catch -k label_with_synonyms.json -t facebook_posts.json -p user-comment -i inner-user-comment-reply
bite
runs a tf-idf statistical analysis: searching for important terms in a text corpus. a user can use a list of key terms to remove from the text in order to avoid being in the statistical model - meaning other terms may be ranked higher. note: again with catch
, if you provide a .json
of text data, you need specify the field that contains the textual data to process. Using -g True
means you'll get a bar plot of the (default) 30-top terms.
$ jab-bite -k label_with_synonyms.json -t twitter_posts.txt -g True
arise
inserts synonyms in an ontology: you define these synonyms (e.g. "exact", "broad", "related", or "narrow") - these new synonyms may be based on the tf-idf statistical analysis from bite
.
$ jab-arise -o pocketmonsters.owl -f tfidf_new_synonyms.tsv
the poem "Jabberwocky" written by Lewis Carrol is described as a "nonsense" poem.
Contributors - thank you!
- @majensen for setting up automated testing w/
pytest
- see pull request #13 for more details
Citing
@article{Pendleton2020,
doi = {10.21105/joss.02168},
url = {https://doi.org/10.21105/joss.02168},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {51},
pages = {2168},
author = {Samantha C. Pendleton and Georgios V. Gkoutos},
title = {Jabberwocky: an ontology-aware toolkit for manipulating text},
journal = {Journal of Open Source Software}
}
You can combine these commands together to form a process of steps of ontology synonym development and text analysis - see the SCENARIO for a working example of this process.