Skip to content

asreview/semantic-clusters

Repository files navigation

ASReview Semantic Clustering

This repository contains the Semantic Clustering plugin for ASReview. It applies multiple techniques (SciBert, PCA, T-SNE, KMeans, a custom Cluster Optimizer) to an ASReview data object, in order to cluster records based on semantic differences. The end result is an interactive dashboard:

Alt Text

Installation

The packaged is called semantic_clustering and can be installed from the download folder with:

pip install .

or from the command line directly with:

python -m pip install git+https://github.com/asreview/semantic-clusters.git

Commands

For help use:

asreview semantic_clustering -h
asreview semantic_clustering --help

Other options are:

asreview semantic_clustering -f <input> -o <output.csv>
asreview semantic_clustering --filepath <input> --output <output.csv>
asreview semantic_clustering -a <output.csv>
asreview semantic_clustering --app <output.csv>
asreview semantic_clustering -v
asreview semantic_clustering --version
asreview semantic_clustering --transformer

Usage

The functionality of the semantic clustering extension is implemented in a subcommand extension. The following commands can be run:

Processing

In the processing phase, a dataset is processed and clustered for use in the interactive interface. The following options are available:

asreview semantic_clustering -f <input.csv or url> -o <output_file.csv>

Using -f will process a file and store the results in the file specified in -o.

Semantic_clustering uses an ASReviewData object, and can handle files, urls and benchmark sets:

asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o output.csv
asreview semantic_clustering -f van_de_Schoot_2017.csv -o output.csv

If an output file is not specified, output.csv is used as output file name.

Transformer

Semantic Clustering uses the allenai/scibert_scivocab_uncased transformer model as default setting. Using the --transformer <model> option, another model can be selected for use instead:

asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o <output_file.csv> --transformer bert-base-uncased

Any pretrained model will work. Here is an example of models, but more exist.

Dashboard

Running the dashboard server is also done from the command line. This command will start a Dash server in the console and visualize the processed file.

asreview semantic_clustering -a output.csv
asreview semantic_clustering --app output.csv

When the server has been started with the command above, it can be found at http://127.0.0.1:8050/ in your browser.

License

MIT license

Contact

Got ideas for improvement? For any questions or remarks, please send an email to asreview@uu.nl.