Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic derivative of Tensor::abs and Huber loss #1441

Closed
WorldSEnder opened this issue Mar 9, 2024 · 4 comments · Fixed by #1444
Closed

Problematic derivative of Tensor::abs and Huber loss #1441

WorldSEnder opened this issue Mar 9, 2024 · 4 comments · Fixed by #1444
Labels
bug Something isn't working

Comments

@WorldSEnder
Copy link
Contributor

WorldSEnder commented Mar 9, 2024

Implementing the Huber loss requires comparing the absolute value against some small kappa, then behaving linearly with the absolute value outside this bound, but quadratic inside. An example implementation would be:

fn huber_loss<const D: usize, B: Backend>(kappa: f32, delta: Tensor<B, D>) -> Tensor<B, D> {
    let u = delta.clone().abs();
    let is_small = u.clone().lower_elem(kappa);
    let inside = delta.powf_scalar(2.).mul_scalar(0.5);
    let outside = u.mul_scalar(kappa).sub_scalar( (kappa * kappa) / 2. );
    outside.mask_where(is_small, inside)
}

Using this implementation leads to NaN values showing up after a few steps of training. This is probably connected to the gradient computation in Tensor::abs but as the problematic small values should be masked out and instead use the (perfectly fine) gradient computation from the quadratic term, I'm not sure how that is happening.

@WorldSEnder
Copy link
Contributor Author

WorldSEnder commented Mar 9, 2024

Okay, so the gradients from the loss get propagated to the inside and outside terms correctly, i.e. where delta is small, the gradients get assigned to inside and the other terms get assigned to outside. The problematic part is that when inside backprops, it does so by multiplying the incoming gradients with delta / abs(delta) (which is supposed to implement a sign function, I suppose?). Where delta is actually (close enough to) 0, the multiplication computes 0 * NaN (incoming gradients * result of the division) which backprops NaN instead of 0 as expected from a masked-out gradient contribution.

I suppose a sign primitive, and using that instead of delta / abs(delta) would solve the issue.

let output = B::float_abs(tensor.clone());
let state = B::float_div(tensor, output);

@WorldSEnder WorldSEnder changed the title Problematic derivative of Tensor::abs Problematic derivative of Tensor::abs and Huber loss Mar 9, 2024
@antimora antimora added the bug Something isn't working label Mar 9, 2024
@antimora
Copy link
Collaborator

antimora commented Mar 9, 2024

Linking Sign OP issue here: #522

@WorldSEnder WorldSEnder mentioned this issue Mar 10, 2024
2 tasks
@antimora
Copy link
Collaborator

Submitted sign tensor operator: #1446

@antimora
Copy link
Collaborator

Sign tensor op PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants