Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments: a framework for Actor-Critic Deep Reinforcement Learning algorithms.

This software supports several deep RL and HRL algorithms (A3C, A2C, PPO, GAE, etc..) in different environments (OpenAI's Gym, Rogue, Sentiment Analysis, Car Controller, etc..) with continuous and discrete action spaces. More details can be found in my Computer Science master thesis: Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments.

Actor-Critic is a big family of RL algorithms. In this work we focus primarily on:

Actor-Critic paradigm
Hierarchical networks
Experience Replay
Exploration intrinsic rewards

Our goal is to perform Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments. In order to do so, we need a framework for experimenting RL algorithms, with ease, on different types of problem. In May 2017, OpenAI has published an interesting repository of RL baselines, and it is still maintaining it with continuous improvements and updates. The aforementioned repository is probably the best choice for testing the performances of already existing RL algorithms requiring very minimal changes, but it is hard to read and modify (at least for the author). Thus we decided to use as code-base for our experiments the open-source A3C algorithm, built on Tensorflow 1.13.1, that comes with our last conference paper Crawling in Rogue's dungeons with (partitioned) A3C, mainly because we already had experience with it and we know the details of its inner mechanisms. But even the chosen code-base is not generic and abstract enough for our goals, for example that code-base is made for Rogue only, thus we had to make some changes to it:

We created a unique configuration file in the Framework root directory, for configuring and combining with ease (in a single point) all the Framework features, algorithms, methods, environments, etc.. (included those that are going to be mentioned in the following points of this enumeration)
Added support for all the Atari games available in the OpenAI Gym repository.
Created a new environment for Sentiment Analysis.
Created a new environment for Car Controller.
Added support for A2C.
Added Experience Replay and Prioritized Experience Replay.
Added Count-Based Exploration.
Added PPO and PVO.
In many OpenAI baselines the vanilla policy and value gradient has been slightly modified in order to perform a reduce mean instead of a reduce sum, because this way it is possible to reduce numerical errors when training with huge batches. Thus, it has been added support for both mean-based and sum-based losses.
Added GAE.
Added support for all the gradient optimizers supported by Tensorflow 1.10.1: Adam, Adagrad, RMSProp, ProximalAdagrad, etc..
Added support for global gradient norm clipping and learning rate decay using some of the decay functions supported by Tensorflow 1.10.1: exponential decay, inverse time decay, natural exp decay.
Added different generic hierarchical structures based on the Options Framework for partitioning the state space using:
- K-Means clustering
- Reinforcement Learning
Made possible to create and use new neural network architectures, simply extending the base one. The base neural network, by default, allows to share network layers between the elements of the hierarchy: parent, siblings.
In order to simplify experiments analysis, it is required a mechanism for an intuitive graphic visualization. We implemented an automatic system for generating GIFs of all episodes observations, and an automatic plotting system for showing statistics of training and testing. The plotting system can also be used to easily compare different experiments (every experiment is colored differently in the plot).
For Rogue environment, we implemented a debugging mechanism that allows to visualize (also inside episode GIFs) the heatmap of the value function of an agent.
Added variations of the auxiliary techniques described in Reinforcement Learning with Unsupervised Auxiliary Tasks: Reward Prediction.
Added support for continuous control.
Added support for multi-action control.

The software used for Rogue environment is a fork of:

The software used for Sentiment Analysis environment is a fork of:

Generic Hierarchical Deep Reinforcement Learning for Sentiment Analysis

This project has been tested on Debian 9 and macOS Mojave 10.14 with Python 3.7. The setup.sh script installs the necessary dependencies. The setup script may fail in installing Rogue, if that is your case, please run Rogue/build_with_no_monsters.sh for building Rogue without monsters and Rogue/build_with_monsters.sh for Rogue with monsters.

The train.sh script starts the training. The test.sh script evaluates the trained agent using the weights in the most recent checkpoint. In A3C/options.py you can edit the default algorithms settings and select one of the following environments:

Rogue (the default)
Sentipolc
Car Controller
Other ATARI games from https://gym.openai.com/envs

Dependencies

Dependencies shared by all environments:

Video-games environment dependencies:

Gym
Pyte

Sentiment Analysis environment dependencies:

TreeTagger
fastText pre-trained model for Italian
VU sentiment lexicon by OpeNER
emojipy a Python library for working with emoji
NLTK
Googletrans, a free and unlimited python library that implemented Google Translate API.
gensim

Before running the setup.sh script, you have to install: virtualenv, python3-dev, python3-pip and make. For more details, please read the related paper.

Citation

Please use the following bibtex entry:

@mastersthesis{amslaurea16718,
	author    = "Francesco Sovrano",
	title     = "Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments",
	school    = "Università di Bologna",
	year      = "2018",
	url = {http://amslaurea.unibo.it/16718/},
}

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
A3C		A3C
Rogue		Rogue
Sentipolc		Sentipolc
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A3C

A3C

Rogue

Rogue

Sentipolc

Sentipolc

.gitattributes

.gitattributes

.gitignore

.gitignore

AUTHORS

AUTHORS

LICENSE

LICENSE

README.md

README.md

setup.sh

setup.sh

train.sh

train.sh

Repository files navigation

Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments: a framework for Actor-Critic Deep Reinforcement Learning algorithms.

Dependencies

Citation

About

Releases

Packages

Contributors 2

Languages

License

Francesco-Sovrano/Framework-for-Actor-Critic-deep-reinforcement-learning-algorithms

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments: a framework for Actor-Critic Deep Reinforcement Learning algorithms.

Dependencies

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages