Vehicular-Game-Theory

Code for my thesis involving autonomous vehicle security using game theory and reinforcement learning.

Files of interest

minimax game.ipynb - Minimax game implementation and solver
platoonenv.py - Environment for the reinforcement learning models, Platoon-v6 is what you want
training.ipynb - Training of the reinforcement learning models
comparison.ipynb - Comparison of the reinforcement learning policies

Anything in the ./old/ folder is likely outdated, but might give insight into the development process.

Responses from the Ray RLLib Slack regarding my PPO agent not learning

Ensuring the step-wise rewards are between -1 and 1 was very effective at improving the learning. Here's the full response from the slack for more info:

I personally would not use curiosity exploration until after I got a baseline version working first.

Does your observation include information about which row it is deciding an action for. You are not using framestacking or an rnn policy in your config. A pure feedforward policy could not learn to keep track of that information on its own.

You want to keep your rewards smaller preferably between 0 and 1. Also I have found that I get much better results it I return an instantaneous reward rather than cumulative on each step. So for example +/- for the square that was just turned on. Then at the end you can supply a terminal reward for the overall score. The reason is because targrt values will based on the return which will be a discounted sum of all the rewards in that episode. That adds a lot of extra counting of rewards and it makes it harder to distinguish good actions from bad ones (credit assignment).

Rllib's implementation of value clipping will Clio if the values loss is greater than 10. I would bet, given your rewards, that almost all of your experience steps are bring clipped. When a step is clipped, it will not be learned on. You can set the vf_clip to float("inf") to disable and see if that makes a difference. Whatcha or for Nan's during training if you do that though. (edited)

Lessons learned from this experience

Python dependency management is evil
Python dataclasses are lit
Ray RLLib is a great tool for reinforcement learning
Canvas is easier than pygame for RL environment rendering
- Plays nicer when in remote Jupyter notebooks
Reward shaping for RL is hard
PPO rewards should be between -1 and 1
Matplotlib makes it very easy to visualize data
Jupyter notebooks are great for prototyping, but it became necessary to move the environment into a separate file to keep things clean

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
old		old
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparison.ipynb		comparison.ipynb
minimax game.ipynb		minimax game.ipynb
minimax-game-solutions.json		minimax-game-solutions.json
platoonenv.py		platoonenv.py
requirements.txt		requirements.txt
training.ipynb		training.ipynb
visualization.png		visualization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

old

old

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

comparison.ipynb

comparison.ipynb

minimax game.ipynb

minimax game.ipynb

minimax-game-solutions.json

minimax-game-solutions.json

platoonenv.py

platoonenv.py

requirements.txt

requirements.txt

training.ipynb

training.ipynb

visualization.png

visualization.png

Repository files navigation

Vehicular-Game-Theory

Files of interest

Responses from the Ray RLLib Slack regarding my PPO agent not learning

Lessons learned from this experience

About

Releases

Packages

Languages

License

TeamDman/Vehicular-Game-Theory

Folders and files

Latest commit

History

Repository files navigation

Vehicular-Game-Theory

Files of interest

Responses from the Ray RLLib Slack regarding my PPO agent not learning

Lessons learned from this experience

About

Resources

License

Stars

Watchers

Forks

Languages