Skip to content

Simple classification engine for government/municipality documents built with TensorFlow

Notifications You must be signed in to change notification settings

jharting/praguehacks2016-categorizer

Repository files navigation

Categorizer (a PragueHacks 2016 project)

Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.

This project is a prototype for

Built during Prague Hacks 2016

Setup

Requirements:

sudo pip install numpy
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL

Run

Prepare data:

  1. copy tagged content files to ./input
  2. copy feature vector to features.csv
  3. export CATS=`cat cats.txt
  4. bash generate-all.sh features.csv $CATS

Train DNN

  1. python train.py $CATS

Run classification on new data

  1. python predict.py features.csv $CATS output.csv

About

Simple classification engine for government/municipality documents built with TensorFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published