Problematic derivative of Tensor::abs and Huber loss #1441

WorldSEnder · 2024-03-09T15:27:00Z

Implementing the Huber loss requires comparing the absolute value against some small kappa, then behaving linearly with the absolute value outside this bound, but quadratic inside. An example implementation would be:

fn huber_loss<const D: usize, B: Backend>(kappa: f32, delta: Tensor<B, D>) -> Tensor<B, D> {
    let u = delta.clone().abs();
    let is_small = u.clone().lower_elem(kappa);
    let inside = delta.powf_scalar(2.).mul_scalar(0.5);
    let outside = u.mul_scalar(kappa).sub_scalar( (kappa * kappa) / 2. );
    outside.mask_where(is_small, inside)
}

Using this implementation leads to NaN values showing up after a few steps of training. This is probably connected to the gradient computation in Tensor::abs but as the problematic small values should be masked out and instead use the (perfectly fine) gradient computation from the quadratic term, I'm not sure how that is happening.

The text was updated successfully, but these errors were encountered:

WorldSEnder · 2024-03-09T15:56:20Z

Okay, so the gradients from the loss get propagated to the inside and outside terms correctly, i.e. where delta is small, the gradients get assigned to inside and the other terms get assigned to outside. The problematic part is that when inside backprops, it does so by multiplying the incoming gradients with delta / abs(delta) (which is supposed to implement a sign function, I suppose?). Where delta is actually (close enough to) 0, the multiplication computes 0 * NaN (incoming gradients * result of the division) which backprops NaN instead of 0 as expected from a masked-out gradient contribution.

I suppose a sign primitive, and using that instead of delta / abs(delta) would solve the issue.

burn/crates/burn-autodiff/src/ops/tensor.rs

Lines 1906 to 1907 in 56f4602

    
           let output = B::float_abs(tensor.clone()); 
        
           let state = B::float_div(tensor, output);

antimora · 2024-03-09T23:37:51Z

Linking Sign OP issue here: #522

antimora · 2024-03-10T17:31:02Z

Submitted sign tensor operator: #1446

antimora · 2024-03-11T15:40:43Z

Sign tensor op PR is merged.

WorldSEnder changed the title ~~Problematic derivative of Tensor::abs~~ Problematic derivative of Tensor::abs and Huber loss Mar 9, 2024

antimora added the bug Something isn't working label Mar 9, 2024

WorldSEnder mentioned this issue Mar 10, 2024

Implement Huber loss #1444

Merged

2 tasks

antimora closed this as completed in #1444 Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problematic derivative of Tensor::abs and Huber loss #1441

Problematic derivative of Tensor::abs and Huber loss #1441

WorldSEnder commented Mar 9, 2024 •

edited

WorldSEnder commented Mar 9, 2024 •

edited

antimora commented Mar 9, 2024

antimora commented Mar 10, 2024

antimora commented Mar 11, 2024

Problematic derivative of Tensor::abs and Huber loss #1441

Problematic derivative of Tensor::abs and Huber loss #1441

Comments

WorldSEnder commented Mar 9, 2024 • edited

WorldSEnder commented Mar 9, 2024 • edited

antimora commented Mar 9, 2024

antimora commented Mar 10, 2024

antimora commented Mar 11, 2024

WorldSEnder commented Mar 9, 2024 •

edited

WorldSEnder commented Mar 9, 2024 •

edited