Skip to content

tardigrde/facebook-data-miner

Repository files navigation

Facebook Data Miner

BCH compliancecodecov

Facebook-Data-Miner provides a set of tools that can help you analyze the data that Facebook has on you.

The vision is to support both data extraction, data analysis and data visualization capabilities through any of the interfaces.

All computation happens on your machine so no data gets sent to remote computers or third-parties.

Prerequisites

As of now the package was only tested on Linux, however with pipenv it is should be easy to set the application up on Windows.

Python

The application was tested on Debian 10 and Python v3.8.3. You will need Python 3.8 (some features are used).

To download Python refer to the official Python distribution page.

Your Facebook data

This package works by analyzing your Facebook data, so you have to download it.

Please refer to the following link in order to do so.

IMPORTANT NOTE: you have to set Facebook's language to English(US) for the time being you request your data. This change can of course be reverted later.

You will only need the zip file's absolute path later to use this software.

You have to change the DATA_PATH variable in the configuration.yml file.

NOTE: facebook-data-miner will extract your zip file in the same directory. For this you may need several GBs of free space depending on the volume of the original data.

This repository

Clone this repository by either using SSH:

git clone git@github.com:tardigrde/facebook-data-miner.git

or HTTPS:

git clone https://github.com/tardigrde/facebook-data-miner.git

Dependecies

This project uses pipenv for dependency and virtual environment management.

Install it by typing:

pip install --user pipenv

In the project root (where Pipfile is) run:

pipenv install --dev

Make sure you run the application in this environment.

Lint

With the makefile:

make lint

Run tests

With the makefile:


make test

Make sure you run the application in this environment.

Usage

The app has both a CLI and an API. For now, API is the preferred way to run the app since there is no database yet, which would hold your facebook data in memory. CLI works but it's slow.

Jupyter notebook

I wrote two jupyter notebooks in order to showcase the capabilities and features of the API and CLI. The notebook contains lots of comments to help understand how the app is built, and what kind of information you can access, and how.

For this you have to start a jupyter server. As in the notebooks mentioned, you have to set the $PYTHONPATH env var before starting a jupyter server.

export PYTHONPATH="$PWD"

Then type the following in your terminal if you want to use jupyer notebook:

jupyer notebook

or for jupyter lab:

jupyter lab

Select notebooks/API.ipynb (or notebooks/CLI.ipynb) and start executing the cells.

The API

As in the notebook already described, the entrypoint is miner/app.py's App class. For now the docstring is the only documentation.

Call it from a script (after you set the data path) like:

from miner.app import App
app = App()

The CLI

The command-line interface has a lot of depth, as you are showed in notebooks/CLI.ipynb, but it is slow, because the data that gets read in does not stay in RAM.

For running the CLI:

export PYTHONPATH="$PWD"
python ./miner/app.py --help

Contribution

Help is more than welcome. It is still a long way to go until v1.0.0

Ideas are welcome too. Feel free to open a new issue.