Skip to content

Framework for developing Actor-Critic deep RL algorithms (A3C, A2C, PPO, GAE, etc..) in different environments (OpenAI's Gym, Rogue, Sentiment Analysis, Car Controller, etc..) with continuous and discrete action spaces.

License

Francesco-Sovrano/Framework-for-Actor-Critic-deep-reinforcement-learning-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This software supports several deep RL and HRL algorithms (A3C, A2C, PPO, GAE, etc..) in different environments (OpenAI's Gym, Rogue, Sentiment Analysis, Car Controller, etc..) with continuous and discrete action spaces. More details can be found in my Computer Science master thesis: Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments.

Actor-Critic is a big family of RL algorithms. In this work we focus primarily on:

  • Actor-Critic paradigm
  • Hierarchical networks
  • Experience Replay
  • Exploration intrinsic rewards

Our goal is to perform Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments. In order to do so, we need a framework for experimenting RL algorithms, with ease, on different types of problem. In May 2017, OpenAI has published an interesting repository of RL baselines, and it is still maintaining it with continuous improvements and updates. The aforementioned repository is probably the best choice for testing the performances of already existing RL algorithms requiring very minimal changes, but it is hard to read and modify (at least for the author). Thus we decided to use as code-base for our experiments the open-source A3C algorithm, built on Tensorflow 1.13.1, that comes with our last conference paper Crawling in Rogue's dungeons with (partitioned) A3C, mainly because we already had experience with it and we know the details of its inner mechanisms. But even the chosen code-base is not generic and abstract enough for our goals, for example that code-base is made for Rogue only, thus we had to make some changes to it:

  • We created a unique configuration file in the Framework root directory, for configuring and combining with ease (in a single point) all the Framework features, algorithms, methods, environments, etc.. (included those that are going to be mentioned in the following points of this enumeration)
  • Added support for all the Atari games available in the OpenAI Gym repository.
  • Created a new environment for Sentiment Analysis.
  • Created a new environment for Car Controller.
  • Added support for A2C.
  • Added Experience Replay and Prioritized Experience Replay.
  • Added Count-Based Exploration.
  • Added PPO and PVO.
  • In many OpenAI baselines the vanilla policy and value gradient has been slightly modified in order to perform a reduce mean instead of a reduce sum, because this way it is possible to reduce numerical errors when training with huge batches. Thus, it has been added support for both mean-based and sum-based losses.
  • Added GAE.
  • Added support for all the gradient optimizers supported by Tensorflow 1.10.1: Adam, Adagrad, RMSProp, ProximalAdagrad, etc..
  • Added support for global gradient norm clipping and learning rate decay using some of the decay functions supported by Tensorflow 1.10.1: exponential decay, inverse time decay, natural exp decay.
  • Added different generic hierarchical structures based on the Options Framework for partitioning the state space using:
    • K-Means clustering
    • Reinforcement Learning
  • Made possible to create and use new neural network architectures, simply extending the base one. The base neural network, by default, allows to share network layers between the elements of the hierarchy: parent, siblings.
  • In order to simplify experiments analysis, it is required a mechanism for an intuitive graphic visualization. We implemented an automatic system for generating GIFs of all episodes observations, and an automatic plotting system for showing statistics of training and testing. The plotting system can also be used to easily compare different experiments (every experiment is colored differently in the plot).
  • For Rogue environment, we implemented a debugging mechanism that allows to visualize (also inside episode GIFs) the heatmap of the value function of an agent.
  • Added variations of the auxiliary techniques described in Reinforcement Learning with Unsupervised Auxiliary Tasks: Reward Prediction.
  • Added support for continuous control.
  • Added support for multi-action control.

The software used for Rogue environment is a fork of:

The software used for Sentiment Analysis environment is a fork of:

This project has been tested on Debian 9 and macOS Mojave 10.14 with Python 3.7. The setup.sh script installs the necessary dependencies. The setup script may fail in installing Rogue, if that is your case, please run Rogue/build_with_no_monsters.sh for building Rogue without monsters and Rogue/build_with_monsters.sh for Rogue with monsters.

The train.sh script starts the training. The test.sh script evaluates the trained agent using the weights in the most recent checkpoint. In A3C/options.py you can edit the default algorithms settings and select one of the following environments:

Dependencies

Dependencies shared by all environments:

Video-games environment dependencies:

Sentiment Analysis environment dependencies:

Before running the setup.sh script, you have to install: virtualenv, python3-dev, python3-pip and make. For more details, please read the related paper.

Citation

Please use the following bibtex entry:

@mastersthesis{amslaurea16718,
	author    = "Francesco Sovrano",
	title     = "Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments",
	school    = "Università di Bologna",
	year      = "2018",
	url = {http://amslaurea.unibo.it/16718/},
}

About

Framework for developing Actor-Critic deep RL algorithms (A3C, A2C, PPO, GAE, etc..) in different environments (OpenAI's Gym, Rogue, Sentiment Analysis, Car Controller, etc..) with continuous and discrete action spaces.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages