In Plotting the Wilson distribution (Wallis 2018), I showed how it is possible to plot the distribution of the Wilson interval for all values of α. This exercise is revealing in a number of ways.
First, it shows the relationship between
- the Normal distribution of probable Binomial observations about the population, ideal or given proportion P, and
- the corresponding distribution of probable values of P about an observed Binomial proportion, p, (referred to as the Wilson distribution, as it is based on the Wilson score interval).
Over the last few years I have become convinced that approaching statistical understanding from the perspective of the tangible observation p is more instructive and straightforward to conceptualise than approaching it (as is traditional) from the imaginary ‘true value’ in the population, P. In particular, whenever you conduct an experiment you want to know how reliable your results are (or to put it an other way, what range of values you might reasonably expect were you to repeat your experiment) — not just if it is statistically significantly different from some arbitrary number, P!
Second, and as a result, just as it is possible to see the closeness of fit between the Binomial and the Normal distribution, through this exercise we can visualise the inverse relationship between Normal and Wilson distributions. We can see immediately that it is a fallacy to assume that the distribution of probable values about p is Normal, although numerous statistics books still quote ‘Wald’-type intervals and many methods operate on this assumption. (I am intermittently amused by plots of otherwise sophisticated modelling algorithms with impossibly symmetric intervals in probability space.)
Third, I showed in the paper that ‘the Wilson distribution’ is properly understood as two distributions: the distribution of probable values of P below and above p. If we employ a continuity-correction, the two distributions become clearly distinct.
This issue sometimes throws people. Compare:
- the most probable location of P,
- the most probable location of P if we know that P < p (lower interval),
- the most probable location of P if we know that P > p (upper interval).
Wilson distributions correspond to (2) and (3) above, obtained by finding the roots of the Normal approximation. See Wallis (2013). The sum, or mean, of these is not (1), as becomes clearer when we plot other related distributions.
There are a number of other interesting and important conclusions from this work, including that the logit Wilson interval is in fact almost Normal, except for p = 0 or 1.
In this post I want to briefly comment on some recent computational work I conducted in preparation for my forthcoming book (Wallis, in press). This involves plotting the Clopper-Pearson distribution. Continue reading “Plotting the Clopper-Pearson distribution”