Plotting the Clopper-Pearson distribution


In Plotting the Wilson distribution (Wallis 2018), I showed how it is possible to plot the distribution of the Wilson interval for all values of α. This exercise is revealing in a number of ways.

First, it shows the relationship between

  1. the Normal distribution of probable Binomial observations about the population, ideal or given proportion P, and
  2. the corresponding distribution of probable values of P about an observed Binomial proportion, p, (referred to as the Wilson distribution, as it is based on the Wilson score interval).

Over the last few years I have become convinced that approaching statistical understanding from the perspective of the tangible observation p is more instructive and straightforward to conceptualise than approaching it (as is traditional) from the imaginary ‘true value’ in the population, P. In particular, whenever you conduct an experiment you want to know how reliable your results are (or to put it an other way, what range of values you might reasonably expect were you to repeat your experiment) — not just if it is statistically significantly different from some arbitrary number, P!

Second, and as a result, just as it is possible to see the closeness of fit between the Binomial and the Normal distribution, through this exercise we can visualise the inverse relationship between Normal and Wilson distributions. We can see immediately that it is a fallacy to assume that the distribution of probable values about p is Normal, although numerous statistics books still quote ‘Wald’-type intervals and many methods operate on this assumption. (I am intermittently amused by plots of otherwise sophisticated modelling algorithms with impossibly symmetric intervals in probability space.)

Third, I showed in the paper that ‘the Wilson distribution’ is properly understood as two distributions: the distribution of probable values of P below and above p. If we employ a continuity-correction, the two distributions become clearly distinct.

This issue sometimes throws people. Compare:

  1. the most probable location of P,
  2. the most probable location of P if we know that P < p (lower interval),
  3. the most probable location of P if we know that P > p (upper interval).

Wilson distributions correspond to (2) and (3) above, obtained by finding the roots of the Normal approximation. See Wallis (2013). The sum, or mean, of these is not (1), as becomes clearer when we plot other related distributions.

There are a number of other interesting and important conclusions from this work, including that the logit Wilson interval is in fact almost Normal, except for p = 0 or 1.

In this post I want to briefly comment on some recent computational work I conducted in preparation for my forthcoming book (Wallis, in press). This involves plotting the Clopper-Pearson distribution. Continue reading “Plotting the Clopper-Pearson distribution”

Further evaluation of Binomial confidence intervals

Abstract Paper (PDF)

Wallis (2013) provides an account of an empirical evaluation of Binomial confidence intervals and contingency test formulae. The main take-home message of that article was that it is possible to evaluate statistical methods objectively and provide advice to researchers that is based on an objective computational assessment.

In this article we develop the evaluation of that article further by re-weighting estimates of error using Binomial and Fisher weighting, which is equivalent to an ‘exhaustive Monte-Carlo simulation’. We also develop an argument concerning key attributes of difference intervals: that we are not merely concerned with when differences are zero (conventionally equivalent to a significance test) but also accurate estimation when difference may be non-zero (necessary for plotting data and comparing differences).

1. Introduction

All statistical procedures may be evaluated in terms of the rate of two distinct types of error.

  • Type I errors (false positives): this is evidence of so-called ‘radical’ or ‘anti-conservative’ behaviour, i.e. rejecting null hypotheses which should not have been rejected, and
  • Type II errors (false negatives): this is evidence of ‘conservative’ behaviour, i.e. retaining or failing to reject null hypotheses unnecessarily.

It is customary to treat these errors separately because the consequences of rejecting and retaining a null hypothesis are qualitatively distinct. Continue reading “Further evaluation of Binomial confidence intervals”

Deconstructing the chi-square


Elsewhere in this blog we introduce the concept of statistical significance by considering the reliability of a single sampled observation of a Binomial proportion: an estimate of the probability of selecting an item in the future. This allows us to develop an understanding of the likely distribution of what the true value of that probability in the population might be. In short, were we to make future observations of that item, we could expect that each sampled probability would be found within a particular range – a confidence interval – a fixed proportion of times, such as 1 in 20 or 1 in 100. This ‘fixed proportion’ is termed the ‘error level’ because we predict that the true value will be outside the range 1 in 20 or 1 in 100 times.

This process of inferring about future observations is termed ‘inferential statistics’. Our approach is to build our understanding in a series of stages based on confidence intervals about the single proportion. Here we will approach the same question by deconstructing the chi-square test.

A core idea of statistical inference is this: randomness is a fact of life. If you sample the same phenomenon multiple times, drawing on different data each time, it is unlikely that the observation will be identical, or – to put it in terms of an observed sample – it is unlikely that the mean value of the observation will be the same. But you are more likely than not to find the new mean near the original mean, and the larger the size of your sample, the more reliable your estimate will be. This, in essence, is the Central Limit Theorem.

This principle applies to the central tendency of data, usually the arithmetic mean, but occasionally a median. It does not concern outliers: extreme but rare events (which, by the way, you should include, and not delete, from your data).

We are mainly concerned with Binomial or Multinomial proportions, i.e. the fraction of cases sampled which have a particular property. A Binomial proportion is a statement about the sample, a simple fraction p = f / n. But it is also the sample mean probability of selecting a value. Suppose we selected a random case from the sample. In the absence of any other knowledge about that case, the average chance that X = x₁ is also p.

The same principle applies to the mean of Real or Integer values, for which one might use Welch’s or Student’s t test, and the median rank of Ordinal data, for which a Mann-Whitney U test may be appropriate.

With this in mind, we can form an understanding of significance, or to be precise, significant difference. The ‘difference’ referred to here is the difference between an uncertain observed value and a predicted or known population value, d = pP, or the difference between two uncertain observed values, d = p₂ – p₁. The first of these differences is found in a single-sample z test, the second in a two-sample z test. See Wallis (2013b).

Figure 1. The single-sample population z test. The statistical model assumes that future unobserved samples are Normally distributed, centred on the population mean P. Distance d is compared with a critical threshold, zα/2.S, to carry out the test.

A significance test is created by comparing an observed difference with a second element, a critical threshold extrapolated from the underlying statistical model of variation. Continue reading “Deconstructing the chi-square”