Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed Variance Can be Negative #26

Open
NAThompson opened this issue Jan 19, 2019 · 4 comments
Open

Computed Variance Can be Negative #26

NAThompson opened this issue Jan 19, 2019 · 4 comments

Comments

@NAThompson
Copy link

This is an example from Golub's paper:

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>

template<class Real>
void golub_example()
{
    std::random_device rd{};
    auto seed = rd();
    std::mt19937 gen(seed);
    std::normal_distribution<Real> dis(500.0, 0.01);

    std::vector<Real> v(50000);
    for(size_t i = 0; i < v.size(); ++i)
    {
        v[i] = dis(gen);
    }

    accumulator_set<Real, stats<tag::variance(lazy)>> acc;
    acc = std::for_each(v.begin(), v.end(), acc);
    std::cout << "Variance = " << variance(acc) << "\n";
    std::cout << "Seed = " << seed << "\n";
}

int main()
{
   golub_example<float>();
}

Output:

Variance (accumulator) = -79.9219
Seed = 1925235287
@NAThompson
Copy link
Author

NAThompson commented Jun 29, 2019

@CromwellEnage : I saw on the mailing list that you are now looking to maintain this library; could you fix this bug? The fix is in Higham's book Accuracy and Stability of Numerical Algorithms, first chapter.

@CromwellEnage
Copy link
Contributor

@NAThompson, I'll look into it and let you know when I or someone else have implemented a solution.

@NAThompson
Copy link
Author

@CromwellEnage : This also implements the correct algorithm.

@antoine79
Copy link

antoine79 commented Apr 17, 2024

@NAThompson , variance(lazy) by definition is lazy and computes variance as the difference between the two accumulators. This issue can be closed in my opinion as variance(immediate) exists already and behaves very closely to the algorithm that you point out.
See:

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>

using namespace boost::accumulators;

template<class Real>
void golub_example()
{
    std::mt19937 gen(1925235287);
    int N = 50000;
    std::normal_distribution<Real> dis(500.0, 0.01);

    std::vector<Real> v(N);
    for(size_t i = 0; i < v.size(); ++i)
    {
        v[i] = dis(gen);
    }

    accumulator_set<Real, stats<tag::variance(immediate)>> acc;
    acc = std::for_each(v.begin(), v.end(), acc);
    std::cout << "Variance = " << variance(acc) << "\n";
}

int main()
{
   golub_example<float>();
}

Output:

Variance = 0.000100213

Now, if N=500e3 instead of 50e3, then we have another numeric problem as the variance becomes 0.7 instead of 1e-4. Another algorithm becomes necessary, like the shifted algorithm where variance is computed by subtracting from each value the value of the first data point. I have a proposal if that's of interest to anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants