Skip to content

C++ and Python implementations of the parallel version of Welford's online algorithm for descriptive stats

License

Notifications You must be signed in to change notification settings

petergaultney/One-Pass-Parallel-Descriptive-Statistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contained are a tested/working Python and C++ implementation of the parallel version of Welford's algorithm for calculating simple descriptive statistics in a single pass over the data. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

The advantage of this approach is twofold: Not only can you calculate these statistics as your data points arrive, for, e.g., a live dashboard, but you can actually distribute your calculation across multiple 'groups', whether phyiscal hardware (servers) or time spans (e.g. in 5 minute 'windows'), and merge those groups at arbitrary times to get the overall statistics for your full population.

In the long run, this provides the advantage of allowing reduced data storage, as you can aggregate and store the data in whatever granularity you need, allowing you to trade off storage, compute, and precision within a large parameter space.

About

C++ and Python implementations of the parallel version of Welford's online algorithm for descriptive stats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published