Civic CrowdAnalytics

Data analytics tool that applies Natural Language Processing (NLP) and Machine Learning (ML), such as concept extraction, idea classification, and sentiment analysis to make sense of crowdsourced civic input. This tool automatically organizes contributions into executive summaries and compelling visualizations, which are easy to comprehend, searchable, and interrelated. Civic CrowdAnalytics (CCA) is based on the scientific publication Civic CrowdAnalytics: Making sense of crowdsourced civic input with big data tools.

Civic CrowdAnalytics features a simple user-interface for submitting an unstructured dataset for analysis. The user can choose, for example, to organize ideas by pre-defined categories, visualize the frequency of recurring concepts, and sort the sentiments of related comments. The tool displays the results in both tabular summaries and interactive visualizations, which users can search and manipulate. Users can also choose to export the results in various formats, such as CSV, PNG, JPEG, SVG, or PDF.

Screenshots

Motivation

Civic technologies are currently bottlenecked by a common need for more effective processing of citizen contributions. Civic CrowdAnalytics provides a solution. By using innovative NLP and ML techniques, the tool automates the analysis and synthesis of key aspects of crowdsourced civic input. This automation will dramatically accelerate and improve the standard data management features that Civic Backoffice will also provide.

Features

In its first version the tool supports the following analytics features:

Classification: This feature organizes the data into main- and subcategories by using well-known classifiers, such as Naive Bayes, Decision Tree, Random Forest, and Support Vector Machine. To train the classification algorithm, the user has first to code part of the dataset by labeling main categories and subcategories and then lets one of the algorithms to categorize the rest of the data. Texts written in any language supported by the NLTK library can be classified;
Concept Extraction: Expressions and words are extracted from the data and displayed by frequency. Concept extraction provides lists of key terms and phrases, distributed by occurrence, which can then be further analyzed using statistical and qualitative methods. The user can specify a list of domain-specific words that should not be included in the analysis. The tool supports the extraction of three-words expressions at maximum;
Sentiment Analysis: The data is analyzed in terms of positive, negative, or neutral sentiment which is assessed regarding established values of words and expressions. For example, words such as reduce, remove, and problem would show a negative sentiment, whereas increase, resolve, and good would show a positive sentiment. So far, CCA supports the analysis of sentiment of texts written in four languages: English, Spanish, Portuguese, and French. For English texts, CCA uses the Vader algorithm of the NLTK toolkit. In the case of Spanish, CCA has its implementation based on the algorithm ML-SentiCon. For the rest of the languages, the feature first translates the text into English by using the python package Googletrans and then employs the Vader algorithm;
Text Similarity: This feature clusters together texts that are similar among them. The feature tokenizes and stemms the text, then it uses TF-IDF to transform the set of text into a Vector Space Model, and finally applies K-means algorithm to group texts represented by similar TF-IDF vectors.

Installation

Backend Installation

Clone the repository git clone https://github.com/ParticipaPY/civic-crowdanalytics.git
Get into the directory civic-crowdanalytics
Create a virtual environment virtualenv env
Activate the virtual environment source env/bin/activate
Get into the directory backoffice
Execute pip install -r requirements.txt to install dependencies. If an error occurs during the installation, it might be because some of these reasons: a) Package python-dev is missing b) Package libmysqlclient-dev is missing c) The environment variables LC_ALL and/or LC_CTYPE are not defined or don't have a valid value
Create a mysql database. Make sure your database collation is set to UTF-8
Rename the file backoffice/backoffice/settings.py.example as backoffice/backoffice/settings.py
Set the configuration parameters of the database in backoffice/backoffice/settings.py

DATABASES = {
    ...
        'NAME': '',
        'USER': '',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '',
    ...
}

Run python manage.py migrate to set up the database schema
Run python manage.py loaddata data.json to load configuration data
Run python manage.py createsuperuser to create an admin user
Run the Django server by running the following command python manage.py runserver 0:8000

Frontend Installation

Install Node.js (version higher than 0.10.32) and update npm (version higher than 2.1.8). See here for an installation guide
Get inside civic-crowdanalytics/frontoffice
Install the project's dependencies by running npm install
Set the backend server url, django user and password in frontoffice/src/Backend.vue

baseURL: 'http://localhost:8000/api',
username: '',
password: '',

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
backoffice		backoffice
frontoffice		frontoffice
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backoffice

backoffice

frontoffice

frontoffice

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Civic CrowdAnalytics

Screenshots

Motivation

Features

Installation

Backend Installation

Frontend Installation

Technologies

Backend Technologies

Frontend Technologies

About

Releases

Packages

Contributors 7

Languages

License

ParticipaPY/civic-crowdanalytics

Folders and files

Latest commit

History

Repository files navigation

Civic CrowdAnalytics

Screenshots

Motivation

Features

Installation

Backend Installation

Frontend Installation

Technologies

Backend Technologies

Frontend Technologies

About

Topics

Resources

License

Stars

Watchers

Forks

Languages