Skip to content

A CampaignLab 2024 Hackathon project to scrape and process OpenElections political campaigning leaflets and convert them into structured JSON files for further analysis using GPT-4 Vision.

License

Notifications You must be signed in to change notification settings

CampaignLab/uk_elections_leaflets

 
 

Repository files navigation

OpenElections Leaflet Scraper and Parser

Project Logo

Description

This repository contains code and data for accomplishing the following:

  1. Scraping campaign leaflets data from the Open Elections leaflet archive.
  2. Sending the images for each leaflet to OpenAI's GPT4-Vision (via an API) in order to parse it into a JSON structure.
    • The JSON is structured to take the images of the leaflets and put them into an interpretable structure containing information the candidate's name, their key policies, the content they mention regarding key issues, contact details and more.
  3. Cleaning and verifying the JSON files obtained from OpenAI's API.

This work builds on the Open Elections leaflet archive (Milazzo, C., Trumm, S., Townsley, J. 2020. OpenElections Leaflet Data, 2010-2019. Nottingham, UK.) which in turn builds on data gathered through Democracy Club.

This project was put together for Campaign Lab as part of one of their Winter Hack Nights fairly quickly using some guess-work and trial-and-error combined with input from ChatGPT. All faults mine.

Table of Contents

Installation

Each of the subdirectories contain requirements.txt files for creating Python environments which will enable running the code for each part of the project.

Usage

More details to come soon!

Contributing

More details to come soon!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

You can reach me on Twitter: @thicknavyrain

About

A CampaignLab 2024 Hackathon project to scrape and process OpenElections political campaigning leaflets and convert them into structured JSON files for further analysis using GPT-4 Vision.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Python 0.6%