bsy-dqn-atari

A deep-q network (DQN) for the OpenAI Gym Atari domain. bsy-dqn-atari learns to play Atari games from pixels at or above human levels.

About

bsy-dqn-atari combines the algorithms published in three reinforcement learning papers: Human Level Control Through Deep Reinforcement Learning, Deep Reinforcement Learning with Double Q-learning, and Prioritized Experience Replay. The software is written in python and uses Keras with the Tensorflow-gpu 1.8 backend.

Implementation Differences

The three papers referenced above train on batches of 32 samples; bsy-atari-dqn trains on batches of 64 samples. The speed, capacity, and bandwith of graphics cards has increased significantly in the past few years. Increasing the batch size resulted in a score increase without a noticible effect on performance when using an NVIDIA 1080 GTX.

This implementation uses Huber Loss to clip error gradients rather than clipping the error term. In the original Nature DQN paper, the authors note that "We also found it helpful to clip the error term from the update [...] to be between -1 and 1." There are two ways that this can be interpreted: ensure that the difference between the actual and expected reward is in the range [-1, 1]; clip the error gradient such that it falls between [-1, 1]. The correct interpretation has been contested by the ML community (1, 2, 3), but DeepMind's code shows that they used the former and clipped the error term. bsy-dqn-atari, however, clips the gradient using Huber Loss, which empirically seems to work better, makes sense mathematically, and is the prevailing community standard.

Importance sampling weights are multipled by the error before applying Huber Loss. This differs from the native Tensorflow huber_loss implementation, and from Open AI's DQN implementation. The change resulted in a large increase in score and makes sense mathematically. With this implementation, errors exceeding 1, when scaled down by the IS weights, may fall within the squared loss region, thereby resulting in a smaller weight adjustment. In Tensorflow's and Open AI's implementations, errors exceeding 1 will always fall into the linear loss region (the derivitave will always be 1). Note that the three referenced papers do not use Huber Loss, but rather use mean squared error with error clipping.

The scoreboard is not clipped out of image samples. In the original DQN implementation images are preprocessed as follows:

Samples are taken every four frames. This is called frame skipping, and increases the training speed.
The maximum pixel values of the current image and the previous are taken to prevent effects caused by flickering (i.e. sprites that only show up every other frame).
A grayscale of the image is taken. This reduces the size of the image by a factor of three because only one channel is needed as opposed to three, red, green, and blue.
The image is scaled down by a factor of two (to 110x84), then an 84x84 square is clipped from that.
The last four images are stacked together to produce a sample. This way velocity can be determined.

bsy-atari-dqn differs in step 4. The images are scaled down to 84x84 directly rather than clipping out a square.

Performance

Game	High Score	100-Game Average	High Training Average (1 life)	Test Command	Version (Tag)
Breakout	864	486	403	python3 ./test.py BreakoutNoFrameskip-v4 asset/pretrained-model/BreakoutNoFrameskip-v4__2018_07_01__08_10.avg.h5	2.0.1
Pong	21	20.79	20	python3 ./test.py PongNoFrameskip-v4 asset/pretrained-model/PongNoFrameskip-v4__2018_08_16__06_06.h5	2.0.1
Space Invaders	3230	2772	2340	python3 ./test.py SpaceInvadersNoFrameskip-v4 asset/pretrained-model/SpaceInvadersNoFrameskip-v4__2018_08_22__17_51.avg.h5	2.0.1
Robotank	72	53	33.97	python3 ./test.py RobotankNoFrameskip-v4 ./asset/pretrained-model/RobotankNoFrameskip-v4__2018_09_14__09_20.avg.h5	2.0.1

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
agent		agent
asset		asset
env		env
network_model		network_model
replay_memory		replay_memory
util		util
.gitignore		.gitignore
README.md		README.md
TODO		TODO
run_test.py		run_test.py
test.py		test.py
test_ram.py		test_ram.py
train.py		train.py
train_ram.py		train_ram.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent

agent

asset

asset

env

env

network_model

network_model

replay_memory

replay_memory

util

util

.gitignore

.gitignore

README.md

README.md

TODO

TODO

run_test.py

run_test.py

test.py

test.py

test_ram.py

test_ram.py

train.py

train.py

train_ram.py

train_ram.py

Repository files navigation

bsy-dqn-atari

About

Implementation Differences

Performance

About

Releases

Packages

Languages

benbotto/bsy-dqn-atari

Folders and files

Latest commit

History

Repository files navigation

bsy-dqn-atari

About

Implementation Differences

Performance

About

Resources

Stars

Watchers

Forks

Languages