Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why DQN cannot learn to switch bitrate? #1

Open
howfars opened this issue Oct 19, 2020 · 2 comments
Open

Why DQN cannot learn to switch bitrate? #1

howfars opened this issue Oct 19, 2020 · 2 comments

Comments

@howfars
Copy link

howfars commented Oct 19, 2020

Hi,godka
Thanks for sharing your DQN-ABR project.I have some question to ask you.
I'm a newbie in RL and ABR field.I tried to train and test the DQN-based ABR algorithm using this project.I test the trained model using the pensieve/test code.But I found it performs worse than the simple buffer-based algorithm.
Here are some results figures.
The first is the episode average reward curve.Finally the episode average reward is negative and can't achieve 0.
image
The second is the total reward testing in the test dataset,concluding the dqn algorithm and bb algorithm.dqn is worse than bb.
image
The third is the picture which shows how dqn algorithm choose bitrate in each test trace.I found that DQN cannot learn to switch the bitrate,but always prefer to choose a certain bitrate .It makes the QoE bad.
image
image
I also try to compare it with the A3C algorithm from pensive,I found that the A3C-based algorithm performs better than DQN-based algorithm,it can learn to switch the bitrate according to buffer occupancy and bandwidth.The QoE metric is also better than bb algorithm.The picture below shows how a3c choose the bitrate.
image
So I wonder why DQN cannot learn to switch the bitrate but prefer to choose one certain bitrate?Is DQN not suitable as a abr algorithm?

@godka
Copy link
Owner

godka commented Oct 19, 2020

Hi,

First things first, welcome to make contributions to the ABR field!

Regardless of whether DQN is suitable for ABR or not, IMHO, the short answer is: DQN can also handle the ABR task.

This repo. was done about 2 years ago, which is currently deprecated.
The stable version of DQN meets the ABR task is here: https://github.com/godka/Pensieve-PPO/tree/dqn/src. We have implemented several state-of-the-art RL algorithms, such as A2C, PPO, as well as off-policy RL algorithms (e.g., Double-DQN).

For more details about DQN please refer to this page. Note that we employ Double-DQN rather than DQN.

https://github.com/godka/Pensieve-PPO/blob/b4d28bae9bc34e27905b23e88da5b22b7203ce9a/src/dqn.py#L16

In terms of the results, we have to claim that, DQN-ABR rivals A2C-ABR ( like Pensieve), while it underperforms PPO-ABR.
The training curve is shown as follows,in which red: dual-PPO, blue: DQN, orange: Double-DQN.

dualppo-vs-dqn

dqn-vs-ddqn

Please feel free to let me know if you have any questions.

Best,
Tianchi.

@howfars
Copy link
Author

howfars commented Oct 19, 2020

Thank you for your reply,it's very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants