sig-fig histogram

Exponential histograms provide a way of measuring large ranges of values with a bounded number of buckets. By distributing data exponentially, small and large values alike may be measured with bounded error. Most exponential histograms work on base-2 bucket sizes. This post introduces sig-fig histogram that works with base-10 bucket sizes.

This is a significant change because it enables the histogram to provide equal % error across the spectrum of values. By providing a bounded number of significant figures worth of precision, the error is bounded to 10%, 1% or even 0.1%. An additional benefit is that the buckets of the histogram precisely match the tick-marks on a log-10 axis in common plotting software.

My desiderata for sig-fig histogram is:

  • Bounded inaccuracy: The amount of error between a point and the bucket it’s placed into should be limited to a percentage throughout the histogram’s range.
  • Algebraic: It is possible to add two histograms or subtract two histograms and the math for doing so is associative.
  • Simple to use: sig-fig histogram is easy to incorporate into places where values are measured.

There’s nothing wrong in an absolute sense with conventional power-of-two exponential histograms. They can achieve the same as the above with the right implementation, but the bounds on the inaccuracy are not as straight-forward as with sig-fig histogram. With 1 significant figure, there’s 10% inaccuracy in the results. With two significant figures, 1% inaccuracy. Three significant figures takes the inaccuracy down to 0.1% and four just 0.01%. It’s easy to reason about and matches human intuition.

It also displays nicely.

As with traditional histograms, sig-fig histograms build on top of counters to record the witnessed values in different buckets. Any two histograms can be added together bucket-wise to get a new histogram that precisely captures the sum of the two distributions. Less intuitively, two recordings of the same histogram over time can be subtracted from one another to capture precisely what happened in the interval between the two observations. For this reason, the histograms are algebraic. They compose nicely.

Sig-fig histogram is available for Rust at crates.io.