Federated XGBoost

Introduction

Federated XGBoost is a gradient boosting library for the federated setting, based off the popular XGBoost project. In addition to offering the same efficiency, flexibility, and portability that vanilla XGBoost provides, Federated XGBoost enables multiple parties to jointly compute a model while keeping their data on site, avoiding the need for a central data storage.

This project is no longer actively maintained.

Installation

Clone this repository and its submodules.

git clone --recursive https://github.com/mc2-project/federated-xgboost.git

Install Federated XGBoost dependencies.

sudo apt-get install cmake libmbedtls-dev
pip3 install numpy grpcio grpcio-tools

Build Federated XGBoost.

cd federated-xgboost
mkdir build
cd build
cmake ..
make

Install the Python package.

cd python-package
sudo python3 setup.py install

Quickstart

This quickstart uses the tutorial located in demo/basic. In this tutorial, each of the two parties in the federation starts an RPC server on port 50051 to listen for the aggregator. The aggregator sends invitations to all parties to join the computation. Once all parties have accepted the invitation, training commences -- the training script demo.py is run.

The implementation currently requires that each party's training data be at the same location, i.e., have the same path, and that the aggregator also have training data.

Modify hosts.config to contain the IP addresses of all parties in the federation. Each line in hosts.config follows the following format:

<ip_addr>:<port>

For the purposes of this demo, <port> should be 50051.

This demo uses data from the Higgs boson dataset. The demo/data/ directory contains 4 files of training data: hb_train_1.csv, hb_train_2.csv, hb_train_3.csv, and hb_train_4.csv. At each party, change the name of a different training data file to hb_train.csv.
Start the RPC server at each party.

python3 serve.py

At the aggregator, send invitations to all parties.

dmlc-core/tracker/dmlc-submit --cluster rpc --num-workers 2 --host-file hosts.config  --worker-memory 4g /path/to/federated-xgboost/demo/basic/demo.py

Each party should receive an invitation through their console:

Request from aggregator [ipv4:172.31.27.60:50432] to start federated training session:
Please enter 'Y' to confirm or 'N' to reject.
Join session? [Y/N]:

Once all parties submit Y, training begins.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
cmake		cmake
demo		demo
dmlc-core @ 4ad377d		dmlc-core @ 4ad377d
doc		doc
include/xgboost		include/xgboost
make		make
python-package		python-package
rabit @ 0c6dd5d		rabit @ 0c6dd5d
src		src
.clang-tidy		.clang-tidy
.disable.travis.yml		.disable.travis.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md

mc2-project/federated-xgboost

Folders and files

Latest commit

History

Repository files navigation

Federated XGBoost

Introduction

Installation

Quickstart

About

Topics

Resources

Stars

Watchers

Forks

Languages