Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading documentation for boost::accumulators::tag::variance #43

Open
EricBackus opened this issue May 11, 2020 · 0 comments
Open

Misleading documentation for boost::accumulators::tag::variance #43

EricBackus opened this issue May 11, 2020 · 0 comments

Comments

@EricBackus
Copy link

The first half of the documentation for variance is correct - it shows the formulas used for variance and mean. This formula is correct, and it is what is implemented in the boost code. (It implements a variant of the Welford method of computing variance.)

But the second half of the documentation is, at best, misleading. The second half starts with "A simplification can be obtained by the approximate recursion...", then gives some formulas, and ends with "However, for small n the difference can be non-negligible." This is misleading because this "simplification" is not, in fact, used by the boost code. And that's a good thing, because this "simplification" would produce incorrect results, and would not actually be any simpler. As far as I can tell, the second half of the documentation should be eliminated entirely.

I note that the current situation leads some people to incorrectly believe that it is sometimes preferable to use boost::accumulators::tag::lazy_variance. For example, at link to a stackoverflow question, one person claims:

Note, that tag::variance calculates variance by an approximate formula. tag::variance(lazy) calculates by an exact formula, specifically: second moment - squared mean which will produce incorrect result if variance is very small because of rounding errors. It can actually produce negative variance. – panda-34 Dec 7 '15 at 13:36

This person incorrectly thinks that tag::variance is only approximate.

As far as I can tell, the only reason to use tag::lazy_variance is if speed is your only consideration, since tag::lazy_variance is roughly twice as fast as tag::variance. But both are quite fast, so this is unlikely to be a consideration for most users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant