DRL-ExampleCode

Implementation code when learning deep reinforcement learning code.

Environment configuration

git clone https://github.com/DarriusL/DRL-ExampleCode.git
cd DRL-ExampleCode
conda env create -f env.yml
conda activate dev

Framework file structure

├── .gitignore
├── agent
│	├── algorithm
│	│	├── actor_critic.py
│	│	├── alg_util.py
│	│	├── base.py
│	│	├── dqn.py
│	│	├── reinforce.py
│	│	├── sarsa.py
│	│	└── __init__.py
│	├── memory
│	│	├── base.py
│	│	├── offpolicy.py
│	│	├── onpolicy.py
│	│	└── __init__.py
│	└── net
│		├── base.py
│		├── mlp.py
│		├── net_util.py
│		└── __init__.py
├── config
│	├── a2c
│	│	└── a2c_nstep_cartpole_on.json
│	├── dqn
│	│	├── doubledqn_cartpole_off.json
│	│	├── doubledqn_cartpole_per.json
│	│	├── dqn_cartpole_off.json
│	│	└── targetdqn_cartpole_off.json
│	├── lab_cfg.json
│	├── reinforce
│	│	├── reinforce_cartpole_mc.json
│	│	├── reinforce_entropyreg_cartpole_mc.json
│	│	└── reinforce_entropyreg_cartpole_nstep.json
│	└── sarsa
│		├── sarsa_cartpole_mc.json
│		└── sarsa_cartpole_nstep.json
├── env
│	├── base.py
│	├── openai_gym.py
│	└── __init__.py
├── env.yml
├── executor.py
├── lib
│	├── callback.py
│	├── glb_var.py
│	├── json_util.py
│	└── util.py
├── LICENSE
├── README.md
└── room
	├── system
	│	├── base.py
	│	├── offpolicy.py
	│	├── onpolicy.py
	│	└── __init__.py
	└── work.py

Command

usage

usage: executor.py [-h] [--config CONFIG] [--mode MODE] [--dev DEV]

option

options:
  -h, --help            show this help message and exit
  --config CONFIG, -cfg CONFIG
                        Path of configration.
  --mode MODE           Mode of operation.(train/test)
  --dev DEV             Enable code debugging

qiuck start

reinforce

python executor.py -cfg='./config/reinforce/reinforce_cartpole_mc.json' --mode='train'
python executor.py -cfg='./config/reinforce/reinforce_entropyreg_cartpole_mc.json' --mode='train'
python executor.py -cfg='./config/reinforce/reinforce_entropyreg_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./cache/data/reinforce/cartpole/[-opt-]/config.json' --mode='test'

python executor.py -cfg='./config/reinforce/reinforce_mountaincar_mc.json' --mode='train'

sarsa

python executor.py -cfg='./config/sarsa/sarsa_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./config/sarsa/sarsa_cartpole_mc.json' --mode='train'
python executor.py -cfg='./cache/data/sarsa/cartpole/[-opt-]/config.json' --mode='test'

dqn

python executor.py -cfg='./config/dqn/dqn_cartpole_off.json' --mode='train'
python executor.py -cfg='./config/dqn/targetdqn_cartpole_off.json' --mode='train'
python executor.py -cfg='./config/dqn/doubledqn_cartpole_off.json' --mode='train'
python executor.py -cfg='./config/dqn/doubledqn_cartpole_per.json' --mode='train'

python executor.py -cfg='./config/dqn/dqn_mountaincar_off.json' --mode='train'

a2c

python executor.py -cfg='./config/a2c/a2c_shared_nstep_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./config/a2c/a2c_shared_mc_cartpole_mc.json' --mode='train'
python executor.py -cfg='./config/a2c/a2c_unshared_gae_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./cache/data/a2c/cartpole/[-opt-]/config.json' --mode='test'

python executor.py -cfg='./config/a2c/a2c_shared_nstep_mountaincar_onbatch.json' --mode='train'

python executor.py -cfg='./config/a2c/a2c_unshared_gae_pong_onbatch.json' --mode='train'

ppo

notes:A2C (PPO) using nstep calculation advantage may cause model parameters to be nan due to gradient disappearance or gradient explosion, so the model is limited to GAE calculation.

python executor.py -cfg='./config/ppo/reinforce_ppo_cartpole_mc.json' --mode='train'
python executor.py -cfg='./config/ppo/reinforce_ppo_cartpole_onbatch.json' --mode='train'

python executor.py -cfg='./config/ppo/a2c_ppo_shared_gae_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./config/ppo/a2c_ppo_unshared_gae_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./cache/data/ppo_a2c/cartpole/[-opt-]/config.json' --mode='test'

python executor.py -cfg='./config/ppo/a3c_ppo_shared_gae_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./config/ppo/a3c_ppo_unshared_gae_cartpole_onbatch.json' --mode='train'

a3c

python executor.py -cfg='./config/a3c/a3c_shared_nstep_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./config/a3c/a3c_unshared_gae_cartpole_onbatch.json' --mode='train'
python executor.py -cfg='./cache/data/a3c/cartpole/shared_nstep_t100000/config.json' --mode='test'

acktr

The effect is poor, and it will be improved in the future.

python executor.py -cfg='./config/acktr/acktr_shared_nstep_cartpole_onbatch.json' --mode='train'

Refrence

[1]Graesser, L., Keng, W. L., & Gupta, A. (2021). Foundations of Deep Reinforcement Learning: Theory and Practice in Python. Apress.

[2]Kengz. (2019). SLM-Lab: Modular Deep Reinforcement Learning framework in PyTorch [GitHub repository]. https://github.com/kengz/SLM-Lab

[3]Kostrikov, Ilya. PyTorch A3C. GitHub, July 9, 2018, https://github.com/ikostrikov/pytorch-a3c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent

agent

config

config

env

env

lib

lib

room

room

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

env.yml

env.yml

executor.py

executor.py

Repository files navigation

DRL-ExampleCode

Environment configuration

Framework file structure

Command

usage

option

qiuck start

Refrence

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
agent		agent
config		config
env		env
lib		lib
room		room
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
executor.py		executor.py

License

DarriusL/DRL-ExampleCode

Folders and files

Latest commit

History

Repository files navigation

DRL-ExampleCode

Environment configuration

Framework file structure

Command

usage

option

qiuck start

Refrence

About

Topics

Resources

License

Stars

Watchers

Forks

Languages