Many conventional statistical methods employ the Normal approximation to the Binomial distribution (see Binomial → Normal → Wilson), either explicitly or buried in formulae.
The well-known Gaussian population interval (1) is inverted to obtain (2).
Gaussian interval (E⁻, E⁺) ≡ P ± z√P(1 – P)/n. (1)
where n represents the size of the sample, and z the two-tailed critical value for the Normal distribution at an error level α, more properly written zα/2. The standard deviation of the population proportion P is S = √P(1 – P)/n, so we could abbreviate the above to (E⁻, E⁺) ≡ P ± zS.
When these methods require us to calculate a confidence interval about an observed proportion, p, we must invert the Normal formula using the Wilson score interval formula (Equation (2)).
Wilson score interval (w⁻, w⁺) ≡ [p + z²/2n ± z√p(1 – p)/n + z²/4n²] / [1 + z²/n].(2)
In a 2013 paper for JQL (Wallis 2013a), I referred to this inversion principle as the ‘interval equality principle’. This means that if (1) is calculated for p = E⁻ (the Gaussian lower bound of P), then the upper bound that results, w⁺, will equal P. Similarly, for p = E⁺, the lower bound of p, w⁻ will equal P.
We might write this relationship as
p ≡ GaussianLower(WilsonUpper(p)), or
P ≡ WilsonLower(GaussianUpper(P)), etc. (3)
where we have functions E⁻ = GaussianLower(P), w⁺ = WilsonUpper(p), etc.
In the paper, I performed a series of computational evaluations of the performance of different interval calculations, following in the footsteps of more notable predecessors. Comparison with the analogous interval calculated directly from the Binomial distribution showed that a continuity-corrected version of the Wilson score interval performed accurately. Continue reading