Introduction
Elsewhere on this blog we have discussed the distribution of values predicted by confidence intervals, referred to more formally as probability density functions (pdfs).
When we plot confidence intervals, we determine the nearest point(s) to our observed value where we would expect the true population value to be and still be considered significantly different from it (at a given error level).
But we can also plot the distribution of the interval function by varying the error level α from 1 (every difference is significant) to almost zero (asymptotically: nothing is significant). We can then see which expected values are more likely than others given one further piece of information:
- The upper and lower bounds are computed independently, so the plot effectively assumes that there is equal chance of an observed property p being greater than the expected property P, as vice versa, except at the boundary, where one distribution becomes empty.
We can then assess the overall shape, observing how these values tail off, a process that is much more instructive than arguing about ‘p-values’.
We can also see something else.
Most traditional discussions of confidence intervals assume that intervals are approximately Normal, an assumption Wallis (2021: 297) calls the ‘Normal fallacy’. This conceptual error has dogged discussion of confidence intervals in the statistics literature, and deeply affects how people rationalise about intervals.
In my book, I point out that even the simplest interval about the single proportion p cannot be Normal. Instead we discover that it is profoundly shaped by the boundaries of the probabilistic range P = [0, 1]. Elsewhere on this blog, I have developed the implications of this argument and plotted more distributions.
I show the shape of the distribution for the Wilson score interval based on p (Wilson 1927) and other related distributions. Only when n is large and p central does this distribution approximate to the Normal.
In this blog post we will refer to the interval p ∈ (w–, w+) in terms of a functional notation. Wallis (2021: 111) proposes two Wilson functions with three parameters.
lower bound w– = WilsonLower(p, n, α/2) = p′ – e′,
upper bound w+ = WilsonUpper(p, n, α/2) = p′ + e′, (1)
where
p′ = p + zα/2²/2n
1 + zα/2²/n, and e′ = zα/2 √p(1 – p)/n + zα/2²/4n²
1 + zα/2²/n.
where n is the sample size, p = f/n is the observed proportion and α/2 is the error level for each tail. Each bound should be treated separately, although they converge at p where α = 1. Continue reading “Plotting the distributions of confidence intervals on algebraic operators on proportions”