Are the space invaders deterministic or stochastic?

Google deepmind achieved human-level performance on 49 Atari games using the Arcade Learning Environment (ALE). I discuss in this article the efficiency of the mechanisms used by Deepmind and Open AI for injecting stochasticity in the ALE.
This repository contains the code I used to reproduce this performance on Breakout and Space Invaders with the exact same network architecture (DQN).
I also shared two notebooks. The first is an in-depth explanation of the algorithm I used. The second is an explanation of the two soft policies I implemented: e-greedy and softmax.

To run your own experiments, modify the global hyperparameters at the beginning of each file. Additionally, I used argparse for a few settings:

--new to create a new model
--name name_of_the_model to use an existing model
--env name_of_the_environment to set the OpenAI Gym environment
--render to render the games
--target to set the device where the tensorflow operations are executed. Use -1 if you don’t have a GPU.
--debug to see where the tensorflow operations are executed
--policy to select a different policy or algorithm. The default is Q-Learning with an e-greedy policy. Other options are “sarsa” to use expected sarsa with an e-greedy policy and “softmax” to use expected sarsa with a softmax policy.

Example:

python GYM_BREAKOUT.py --new --target -1 --env BreakoutDeterministic-v4

python GYM_SPACE_INVADERS.py --new --target 1 --env SpaceInvaders-v4 --render

python GYM_SPACE_INVADERS.py --new --target -1 --env SpaceInvaders-v4 --render --policy softmax

You can compare your experiments with the tensorboard runs I added to this repository. You can find the hyperparamters of these experiments in the chapter methods of the article.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tf_logs/run		tf_logs/run
Breakout_explained.ipynb		Breakout_explained.ipynb
GYM_ACROBOT.py		GYM_ACROBOT.py
GYM_BREAKOUT.py		GYM_BREAKOUT.py
GYM_SPACE_INVADERS.py		GYM_SPACE_INVADERS.py
GymBreakout-20200906194629-15893-427.0_small.gif		GymBreakout-20200906194629-15893-427.0_small.gif
LICENSE		LICENSE
README.md		README.md
e_greedy_and_softmax_explained.ipynb		e_greedy_and_softmax_explained.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf_logs/run

tf_logs/run

Breakout_explained.ipynb

Breakout_explained.ipynb

GYM_ACROBOT.py

GYM_ACROBOT.py

GYM_BREAKOUT.py

GYM_BREAKOUT.py

GYM_SPACE_INVADERS.py

GYM_SPACE_INVADERS.py

GymBreakout-20200906194629-15893-427.0_small.gif

GymBreakout-20200906194629-15893-427.0_small.gif

LICENSE

LICENSE

README.md

README.md

e_greedy_and_softmax_explained.ipynb

e_greedy_and_softmax_explained.ipynb

Repository files navigation

Are the space invaders deterministic or stochastic?

About

Releases

Packages

Languages

License

NicMaq/Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Are the space invaders deterministic or stochastic?

About

Topics

Resources

License

Stars

Watchers

Forks

Languages