Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer / Visitor / Mapper confusion, no documentation #1755

Open
benbaarber opened this issue May 11, 2024 · 4 comments
Open

Optimizer / Visitor / Mapper confusion, no documentation #1755

benbaarber opened this issue May 11, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@benbaarber
Copy link
Contributor

benbaarber commented May 11, 2024

I am working on a reinforcement learning crate (rl) and am using burn to implement a deep Q network example. I am trying to use the AdamW optimizer, but the documentation around using these optimizers directly is very unclear. I have read chapter 6 of the burn book, and have looked through the docs, but I am still confused as to why AdamWConfig::init() returns impl Optimizer instead of the actual AdamW struct. What is the purpose of these optimizer structs if the configs dont initialize them, and you can't manually construct one?

Overall really happy with the burn experience, and want to contribute at some point in the future, but I've noticed when I get into the weeds of a very custom use case, the documentation is unclear or missing.

In this case, I am forced to add another generic on my SnakeDQN struct and all implementations even though I know exactly which optimizer it will be using. I am not aware of a better workaround if I want to keep my current paradigm of holding initialized modules in the SnakeDQN struct.

pub struct SnakeDQN<'a, O: Optimizer<Model<B>, B>> {
    env: &'a mut GrassyField<FIELD_SIZE>,
    policy_net: Model<B>,
    target_net: Model<B>,
    memory: ReplayMemory<GrassyField<FIELD_SIZE>>,
    loss: HuberLoss<B>,
    optimizer: O,
    exploration: EpsilonGreedy,
    gamma: f32,
    tau: f32,
    lr: f32,
    episode: u32,
}

impl<'a, O> SnakeDQN<'a, O>
where 
    O: Optimizer<Model<B>, B>,
{
    pub fn new(
        env: &'a mut GrassyField<FIELD_SIZE>,
        model_config: ModelConfig,
        loss_config: HuberLossConfig,
        optim_config: AdamWConfig,
        exploration: EpsilonGreedy,
    ) -> Self {
        Self {
            env,
            policy_net: model_config.init(&*DEVICE),
            target_net: model_config.init(&*DEVICE),
            memory: ReplayMemory::new(50000),
            loss: loss_config.init(&*DEVICE),
            optimizer: optim_config.init(),
            exploration,
            gamma: 0.86,
            tau: 2.7e-2,
            lr: 3.58e-3,
            episode: 0,
        }
    }
}
@antimora antimora added the documentation Improvements or additions to documentation label May 12, 2024
@antimora
Copy link
Collaborator

CC @laggui @nathanielsimard

@benbaarber
Copy link
Contributor Author

Having another issue as well, am trying to implement the soft update of the target network in a deep Q learning environment, see pytorch equivalent:

pnsd, tnsd = self.policy_net.state_dict(), self.target_net.state_dict()

for key in pnsd:
  tnsd[key] = pnsd[key] * self.hp["tau"] + tnsd[key] * (1 - self.hp["tau"])

self.target_net.load_state_dict(tnsd)

I see that burn has the concepts of visitors and mappers, which seemed to be the best way to implement this. However, the documentation around visitors, mappers, and generally accessing the parameters of a model is either missing or very hard to find. Even this small section in the book is out of date with the trait definitions. There do not appear to be any examples of the intended way to use these traits either.

I feel like burn has tons of potential to be THE goto machine learning framework in rust, but the lack of clear documentation and examples is holding it back. I really think documenting existing code should be a higher priority than adding new features at this point, and would be happy to help if someone can answer my questions and clear this stuff up for me.

@benbaarber benbaarber changed the title Optimizer confusion, no documentation Optimizer / Visitor / Mapper confusion, no documentation May 12, 2024
@nathanielsimard
Copy link
Member

@benbaarber Thanks for the PR that made the optimizer concrete. Hope this solves your problem regarding holding an optimizer in your struct. Regarding the visitor and mapper, I think it's a very nice candidate for a new advanced section in the book. Adding more docs on the trait wouldn't be a bad idea either.

For your specific problem, I think you would need a mapper, similar to how our optimizers are actually implemented and update the model's parameters. See this mapper as a reference. Let me know if it helps, and don't hesitate to ask your questions on the Discord; we are a bit more responsive there!

Right now, we aren't really prioritizing more features, but rather improving performance. Feel free to open issues in areas where Burn would benefit from more documentation and examples; we may prioritize it!

@benbaarber
Copy link
Contributor Author

benbaarber commented May 14, 2024

@nathanielsimard Thanks for the response

For your specific problem, I think you would need a mapper, similar to how our optimizers are actually implemented and update the model's parameters. See this mapper as a reference.

This helps alot, thanks.

Right now, we aren't really prioritizing more features, but rather improving performance. Feel free to open issues in areas where Burn would benefit from more documentation and examples; we may prioritize it!

Sounds good. I understand the Visitor/Mapper paradigm a bit better now after looking around the codebase, though I think some really basic examples would go a long way with newcomers (like me) being able to quickly and easily use these features. The thing I was confused about at first was where to store the parameters after visiting them, then I was a bit confused about using Mapper to adjust the parameters of the target model as a function of itself and the policy model, which I will try to figure out today and will definitely ask in the discord if I'm lost there.

Thank you again for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants