Jabberwocky

see Jabberwocky site for in-depth explanation and working scenarios (including test files)

Jabberwocky is a toolkit for ontologies. Since we all know ontologies are "nonsense". Not enough tools existsing utilise the power of ontologies. Don't hesitate to create an issue or pull request (see guidelines first).

Version

See setup.py in your local copy for version number | or Releases:

v1.0.0.0 [29/06/2020]
v2.0.0.0 [10/05/2021]
- includes spacy PhraseMatcher
- own synonym tags
- plot output for tf-idf

Install

$ git clone https://github.com/sap218/jabberwocky
$ cd jabberwocky
$ python3 setup.py install --user

note: if you are using a virtual environment you can avoid --user

Prerequisites

$ pip3 install click BeautifulSoup4 scikit-learn pandas lxml pytest spacy matplotlib

or after installing, use the requirements.txt file:

$ pip3 install -r requirements.txt

Elements

command	description
`bandersnatch`	extract synonyms from an RDF/XML syntax `OWL` ontology
`catch`	extract elements / sentences of text using key words
`bite`	run statistical tf-idf for important words from text
`arise`	adding / updating new synonyms to an ontology

Ontology formats

jabberwocky works with the OWL ontology format: RDF/XML - for example, well-known biomedical ontologies such as doid.owl, hpo.owl, and uberon.owl will all work, plus your own created.

Examples

for examples of Jabberwocky's commands in use, please see the site.

OR to run the automated tests (in the cloned directory):

$ git submodule init
$ git submodule update
$ tox

bandersnatch

bandersnatch curates synonyms for a list of key terms / or words of interest from an ontology of your choice, you provide a list of ontology synonym tags. note: it is recommended your list of keywords are exactly the classes from your chosen ontology (all in lowercase).

$ jab-bandersnatch -o hpo.owl -s ontology_synonym_tags.txt -k words_of_interest.txt

catch

catch essentially "catches" key elements / sentences from textual data using a .json of key terms and their synonyms, you can use the outcome from bandersnatch. A user will also provide a .txt or .json of the text data. note: if a .json of text data is provided, you need specify the parameter for the field that contains the textual data to process.

$ jab-catch -k label_with_synonyms.json -t facebook_posts.json -p user-comment -i inner-user-comment-reply

bite

bite runs a tf-idf statistical analysis: searching for important terms in a text corpus. a user can use a list of key terms to remove from the text in order to avoid being in the statistical model - meaning other terms may be ranked higher. note: again with catch, if you provide a .json of text data, you need specify the field that contains the textual data to process. Using -g True means you'll get a bar plot of the (default) 30-top terms.

$ jab-bite -k label_with_synonyms.json -t twitter_posts.txt -g True

arise

arise inserts synonyms in an ontology: you define these synonyms (e.g. "exact", "broad", "related", or "narrow") - these new synonyms may be based on the tf-idf statistical analysis from bite.

$ jab-arise -o pocketmonsters.owl -f tfidf_new_synonyms.tsv

Thanks! 🐉

the poem "Jabberwocky" written by Lewis Carrol is described as a "nonsense" poem.

Contributors - thank you!

@majensen for setting up automated testing w/ pytest - see pull request #13 for more details

Citing

@article{Pendleton2020,
  doi = {10.21105/joss.02168},
  url = {https://doi.org/10.21105/joss.02168},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {51},
  pages = {2168},
  author = {Samantha C. Pendleton and Georgios V. Gkoutos},
  title = {Jabberwocky: an ontology-aware toolkit for manipulating text},
  journal = {Journal of Open Source Software}
}

ONE LAST THING...

You can combine these commands together to form a process of steps of ontology synonym development and text analysis - see the SCENARIO for a working example of this process.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
arise		arise
bandersnatch		bandersnatch
bite		bite
catch		catch
docs		docs
images		images
test_files		test_files
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
paper.bib		paper.bib
paper.md		paper.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

License

sap218/jabberwocky

Folders and files

Latest commit

History

Repository files navigation

Jabberwocky

Version

Install

Prerequisites

Elements

Ontology formats

Examples

bandersnatch

catch

bite

arise

Thanks! 🐉

ONE LAST THING...

About

Topics

Resources

License

Stars

Watchers

Forks

Languages