Plotting the Clopper-Pearson distribution


In Plotting the Wilson distribution (Wallis 2018), I showed how it is possible to plot the distribution of the Wilson interval for all values of α. This exercise is revealing in a number of ways.

First, it shows the relationship between

  1. the Normal distribution of probable Binomial observations about the population, ideal or given value P, and
  2. the corresponding distribution of probable values of P about an observed Binomial proportion, p, (referred to as the Wilson distribution, as it is based on the Wilson score interval).

Over the last few years I have become convinced that approaching statistical understanding from the perspective of the tangible observation p is more instructive and straightforward to conceptualise than approaching it (as is traditional) from the imaginary ‘true value’ in the population, P. In particular, whenever you conduct an experiment you want to know how reliable your results are (or to put it an other way, what range of values you might reasonably expect were you to repeat your experiment) — not just if it is statistically significantly different from some arbitrary number, P!

Second, and as a result, just as it is possible to see the closeness of fit between the Binomial and the Normal distribution, through this exercise we can visualise the inverse relationship between Normal and Wilson distributions. We can see immediately that it is a fallacy to assume that the distribution of probable values about p is Normal, although numerous statistics books still quote ‘Wald’-type intervals and many methods operate on this assumption. (I am intermittently amused by plots of otherwise sophisticated modelling algorithms with impossibly symmetric intervals in probability space.)

Third, I showed in the paper that ‘the Wilson distribution’ is properly understood as two distributions: the distribution of probable values of P below and above p. If we employ a continuity-correction, the two distributions become clearly distinct.

This issue sometimes throws people. Compare:

  1. the most probable location of P,
  2. the most probable location of P if we know that P < p (lower interval),
  3. the most probable location of P if we know that P > p (upper interval).

Wilson distributions correspond to (2) and (3) above, obtained by finding the roots of the Normal approximation. See Wallis (2013). The sum, or mean, of these is not (1), as becomes clearer when we plot other related distributions.

There are a number of other interesting and important conclusions from this work, including that the logit Wilson interval is in fact almost Normal, except for p = 0 or 1.

In this post I want to briefly comment on some recent computational work I conducted in preparation for my forthcoming book (Wallis, in press). This involves plotting the Clopper-Pearson distribution. Continue reading “Plotting the Clopper-Pearson distribution”

Further evaluation of Binomial confidence intervals

Abstract Paper (PDF)

Wallis (2013) provides an account of an empirical evaluation of Binomial confidence intervals and contingency test formulae. The main take-home message of that article was that it is possible to evaluate statistical methods objectively and provide advice to researchers that is based on an objective computational assessment.

In this article we develop the evaluation of that article further by re-weighting estimates of error using Binomial and Fisher weighting, which is equivalent to an ‘exhaustive Monte-Carlo simulation’. We also develop an argument concerning key attributes of difference intervals: that we are not merely concerned with when differences are zero (conventionally equivalent to a significance test) but also accurate estimation when difference may be non-zero (necessary for plotting data and comparing differences).

1. Introduction

All statistical procedures may be evaluated in terms of the rate of two distinct types of error.

  • Type I errors (false positives): this is evidence of so-called ‘radical’ or ‘anti-conservative’ behaviour, i.e. rejecting null hypotheses which should not have been rejected, and
  • Type II errors (false negatives): this is evidence of ‘conservative’ behaviour, i.e. retaining or failing to reject null hypotheses unnecessarily.

It is customary to treat these errors separately because the consequences of rejecting and retaining a null hypothesis are qualitatively distinct. Continue reading “Further evaluation of Binomial confidence intervals”

φ intervals by inverted Liebetrau Gaussian s(φ)


Experimenting with deriving accurate 2 × 2 φ intervals, I also considered using Liebetrau’s population standard deviation estimate.

To recap: Cramér’s φ (Cramér 1946) is a probabilistic intercorrelation for contingency tables based on the χ² statistic. An unsigned φ score is defined by

Cramér’s φ  = √χ²/N(k – 1)(1)

where χ² is the r × c test for homogeneity (independence), N is the total frequency in the table, and k the minimum number of values of variables X and Y, i.e. k = min(r, c). For 2 × 2 tables, k – 1 = 1, so φ = √χ²/N is often quoted.

An alternative formula for 2 × 2 tables obtains a signed result, where a negative sign implies that the table tends towards the opposite diagonal.

signed 2 × 2 φ ≡ (adbc) / √(a + b)(c + d)(a + c)(b + d),(2)

where a, b, c and d are cell frequencies. However, Equation (2) cannot be applied to larger tables.

The method I discuss here is potentially extensible to other effect sizes and other published estimates of standard deviations.

We employ Liebetrau’s best estimate of the population standard deviation of φ for r × c tables:

s(φ) ≈ 1
i j
pi+² p+j²
– 3Σ
pi+ p+j
)² – 3Σ
pi+ p+j

i j
[ pi,j
pi+ p+j
pk+ p+j
pi+ p+l
 )]}, for φ ≠ 0, (3)

where pi,j = fi,j / N and pi+, p+j, etc. represent row and column (prior) probabilities (Bishop, Fienberg and Holland 1975: 386). If φ = 0 we adjust the table by a small delta.

Continue reading “φ intervals by inverted Liebetrau Gaussian s(φ)”