GlobalGiving Depth · { }

Problem Statement: GlobalGiving’s network consists of many organizations based in the US along with some nonprofits in other countries. There are still hundreds of thousands more organiztions which GlobalGiving knows of, but may not have information on the types of work they do. It is possible, given an NGO's website, to discern and characterize the work of these NGOs using statistics, natural language processing, and machine learning in an automated way.

Repo Description: This repo consists of our various approaches to characterizing the work of various NGOs. These approaches fall into a few different categories:

Classification (see /classification folder for code, details, and examples)

Using machine learning classifiers, we can feed in text from an NGO's website and predict with reasonable accuracy the categories which that NGO may fall into. The classifiers provided here consist of a Stochastic Gradient Descent classifier and a Bag of Words classifier.

Clustering (see /clustering folder for code, details, and examples)

GlobalGiving's existing categorization scheme is certainly sufficient for the purposes it serves, but a categorization scheme based on the logical differences between language used on NGO websites would be more useful in identifying/characterizing unknown NGOs. The clustering algorithms provided here consist of a K-Means implementation using Document Embeddings and an implementation of Latent Dirichlet Allocation.

Processing (see /processing folder for code, details, and examples)

How we classify/cluster the data is just as important as the way we obtain/process the data. For this project, we used an HTML Parser that leverages the BeautifulSoup library to pull clean and filtered text from NGO websites.

Past approaches

Refer to the wiki to read about some other past approaches which were tried and abandoned.

Getting Started

Installation

This project was built in Python 3.7. Dependencies can be installed into a virtual environment from the requirements.txt file using pipenv:

pipenv install -r requirements.txt

For LDA: It is necessary to use the NLTK Downloader to obtain “stopwords,” “WordNetLemmatizer,” and other resources from the Natural Language Toolkit. For more information on the NLTK Downloader, please refer to NLTK Documentation.

Usage

Each subfolder (classification, clustering, processing) has Jupyter notebooks with examples of code usage. Refer to the wiki for detailed function documentation.

Team

Product Manager - Josh Burke (@JoshBurke)
Technical Lead - Aryn Harmon (@achcello)

Software Devs

Jacqueline Osborn (@jackieo5023)
Lam Tran (@Lam7150)
Eugenia Chen (@Polarpi)
Prashant Pokhriyal (@psp2)

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.circleci		.circleci
classification		classification
clustering		clustering
images		images
processing		processing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci

classification

classification

clustering

clustering

images

images

processing

processing

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

GlobalGiving Depth · { }

Getting Started

Installation

Usage

Team

License

About

Releases

Packages

Contributors 5

Languages

License

hack4impact-uiuc/globalgiving-depth

Folders and files

Latest commit

History

Repository files navigation

GlobalGiving Depth · { }

Getting Started

Installation

Usage

Team

License

About

Resources

License

Stars

Watchers

Forks

Languages