Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatible to OpenAI Gym? #4

Open
AI-Guru opened this issue Jan 6, 2019 · 9 comments
Open

Compatible to OpenAI Gym? #4

AI-Guru opened this issue Jan 6, 2019 · 9 comments

Comments

@AI-Guru
Copy link

AI-Guru commented Jan 6, 2019

Namaste!

Great work! I really like it!

Question: Is DeepTraffic compatible to OpenAI Gym? I remember Nvidia writing about it in a 2017 blog-article.
https://blogs.nvidia.com/blog/2017/07/07/deeptraffic-how-an-mit-simulation-game-uses-deep-learning-to-reduce-gridlock/

Best,
Tristan

@lexfridman
Copy link
Owner

lexfridman commented Jan 6, 2019

We did implement OpenAI Gym compatibility, so you could train the agent on your own machine. But never released it, because the challenge was that we want that code to then be submitted and evaluated in an automated way, so that it could be considered for the leaderboard. There's ways to do this and it's something I'm hoping to do in 2019. Help would be appreciated, especially in ideas of how the full pipeline can be set up. Alternatively, we're considered a totally new Deep RL competition that from the beginning is designed to allow for both in-browser and offline training. I see us doing the latter, and continuing to use DeepTraffic as an accessible education tool.

@AI-Guru
Copy link
Author

AI-Guru commented Jan 7, 2019

I see! Thanks for the clarification!

@Shmuma
Copy link

Shmuma commented Jan 25, 2019

@AI-Guru JFYI, I've implemented more or less accurate Gym version of deeptraffic environment.
Going to opensource it after the competition. Probably it could be merged into gym as well.

@AI-Guru
Copy link
Author

AI-Guru commented Jan 25, 2019

Thank you so very much!

@jackft
Copy link
Collaborator

jackft commented Jan 25, 2019

@AI-Guru @Shmuma, if either of you have questions regarding implementation details, I can provide some guidance or answers.

@Shmuma
Copy link

Shmuma commented Feb 2, 2019

@jackft I have couple of questions about implementation, it would be nice to have answers without js-code reverse engineering :)

First one: how input to the network is formed? I see input shape of

var num_inputs = (lanesSide * 2 + 1) * (patchesAhead + patchesBehind);
var network_size = num_inputs * temporal_window + num_actions * temporal_window + num_inputs;

But how observations are flattened into the vector and in which order temporal history is appended into final observation is not clear. Ideally, it would be description like this:

  • input offset 0..num_inputs-1: row-wise observations for current timestamp
  • offset num_inputs...num_inputs*2: row-wise observations for previous timestamp
  • etc

Second question: how occupancy is calculated? I guess every cell has height of 10 pixels, but when cell is considered occupied? If single pixel is occupied or if more than half of the cell is overlapped by car?

Third question: car dynamics. How speed is being calculated? Does car changes its effective speed immediately as car in front of it changes lane or it happened smoothly? How acceleration/braking effect speed?

I guess, it could be lots of questions, but I'm trying to build an accurate python version of the environment, which, I believe, could be useful in future launches of competition. On my side, I can make a promise to opensource it for everyone's benefits, now or later :).

@jackft
Copy link
Collaborator

jackft commented Feb 5, 2019

@Shmuma
First, the input is formed by creating a 1d vector from the 2d occupancy grid by looping through the 2d occupancy grid in row major order, making sure you only loop over the portion of the full map which the car can observe. ConvNetJS handles the temporal window, this merely appends to the input (1) a previous state and (2) a previous 1-hot-vector representation of the action taken in a previous state.

Second, occupancy is calculated by looping through all the cars and adding them to the occupancy grid. We map the car coordinates to cell coordinates and then take the floor of the cell coordinates: the result is a pair of indices.

Third, there are several parts to this question.
Longitudinal velocity is calculated by speed_factor * max_speed. When accelerating or decelerating we increment or decrement the speed_factor (it should always be between [0, 1]).
When changing lanes, we maintain a target lane. If the car's target lane changes, it moves over gradually. A car can only make decisions every N frames. A lane change takes N frames.

@AI-Guru
Copy link
Author

AI-Guru commented Apr 23, 2019

@shumna, any news? :D

@Shmuma
Copy link

Shmuma commented Apr 24, 2019

@AI-Guru Yep, sorry for delay :)

There is my repo with gym-compatible DeepTraffic environment: https://github.com/Shmuma/deep-traffic-2019

It includes environment with tests, training and playing code and not fully finished pytorch -> js converter of the trained model.

This project is not finished (sadly), I was distracted by other things. The next step should be finishing JS export utility and make sure gym environment has close dynamics to the native code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants