Correcting for continuity

Introduction

Many conventional statistical methods employ the Normal approximation to the Binomial distribution (see Binomial → Normal → Wilson), either explicitly or buried in formulae.

The well-known Gaussian population interval (1) is

Gaussian interval (E, E+) ≡ P ± zP(1 – P)/n, (1)

where n represents the size of the sample, and z the two-tailed critical value for the Normal distribution at an error level α, more properly written zα/2. For reasons of space, we will shorten zα/2 to z, but note that this term is a function of α.

The standard deviation of the population proportion P is S = √P(1 – P)/n, so we could abbreviate the above to (E, E+) ≡ P ± z.S.

When these methods require us to calculate a confidence interval about an observed proportion, p, we must invert the Normal formula using the Wilson score interval formula (Equation (2)).

Wilson score interval (w, w+) ≡ p + z²/2n ± zp(1 – p)/n + z²/4
1 + z²/n
.
(2)

In a 2013 paper for JQL (Wallis 2013a), I referred to this inversion process as the ‘interval equality principle’. This means that if (1) is calculated for p = E (the Gaussian lower bound of P), then the upper bound that results, w+, will equal P. Similarly, for p = E+, the lower bound of p, w will equal P.

We might write this relationship as

p ≡ GaussianLower(WilsonUpper(p, n, α/2), n, α/2), or, alternatively
P ≡ WilsonLower(GaussianUpper(P, n, α/2), n, α/2), etc. (3)

and use the equivalences E ≡ GaussianLower(P, n, α/2), w+ ≡ WilsonUpper(P, n, α/2), etc.

Note. The parameters n and α become useful later on. At this stage the inversion concerns only the first parameter, p or P.

Nonetheless the general principle is that if you want to calculate an interval about an observed proportion p, you can derive it by inverting the function for the interval about the expected population proportion P, and swapping the bounds (so ‘Lower’ becomes ‘Upper’ and vice versa).

In the paper, using this approach I performed a series of computational evaluations of the performance of different interval calculations, following in the footsteps of more notable predecessors. Comparison with the analogous interval calculated directly from the Binomial distribution showed that a continuity-corrected version of the Wilson score interval performed accurately.

Continuity corrections

Continuity corrections are used because the original source Binomial distribution (that we are approximating to) is ‘chunky’. See Figure 1 below.

Yates' correction applied to a Normal (Gaussian) approximation to the Binomial distribution for P = 0.3, n = 10. 1/2n is added to either side of the Gaussian curve (dotted lines) sufficient to encompass almost the entire discrete Binomial.
Figure 1. Yates’s correction applied to a Normal (Gaussian) approximation to the Binomial distribution for P = 0.3, n = 10, α = 0.05. A correction term, 12n, is added to either side of the Gaussian curve (dotted lines) sufficient to encompass almost the entire discrete Binomial.

All observed proportions must be whole fractions of n, p ∈ {0/n, 1/n, 2/n,… n/n}, and yet the interval calculation we use is based on the Normal interval (1), which is continuous. So, using a method due to Frank Yates, we add an extra ‘half 1/n’ to intervals on either side of P.

Yates’s correction for continuity

The most famous example of a continuity correction is employed with a standard chi-square formula

Yates’s χ² = Σ(|oi,jei,j| – 0.5)² / ei,j (4)

for all cells at index positions i, j in a contingency table. This formula is expressed in units of n rather than 1, so the correction is simply 0.5.

Strictly speaking, Yates’s formula has a flaw. It should guarantee that if the difference between observed and expected cells, d = oi,jei,j, is within ±0.5, the entire term should go to zero. This makes little difference for 2 × 2 tables, but for tables with more than one degree of freedom the following is recommended.

Yates’s χ² = Σ(DiffCorrect(oi,jei,j, 0.5))² / ei,j,(4′)

where DiffCorrect(d, c) = d – c if d > c, d + c if d < –c, and 0 otherwise.

χ² is based on the Normal distribution z (Wallis 2013b). The standard deviation for a Gaussian population interval about a known or predicted population value P (Equation (2)) may be corrected for continuity by Yates’s population interval.

Yates’s interval (Ecc, E+cc) ≡ P ± (zP(1 – P)/n + 12n).(5)

It is easy to see the relationship between Equations (5) and (1). Moreover it is straightforward to apply other adjustments to the standard deviation or variance (the variance is simply the square of the standard deviation, so this amounts to the same thing).

Correcting Wilson

The continuity-corrected Wilson score interval formula is not often presented, and when it does appear, it appears in slightly different forms in the literature. However, on the basis of Robert Newcombe’s (1998) paper, I have tended to present it as Equation (6).

In fact even this is simplified, as it is also necessary to employ ‘min’ and ‘max’ constraints to ensure that wcc ∈ [0, p] and w+cc ∈ [p, 1]. See below.

wcc2np + z² – (zz² – 1/n + 4np(1 – p) + (4p – 2) + 1)
2(n + z²)
, and

w+cc2np + z² + (zz² – 1/n + 4np(1 – p) – (4p – 2) + 1)
2(n + z²)
.
(6)

As with the uncorrected Wilson interval, simplifications are possible, e.g.

e = 2np + z², f = z² – 1/n + 4np(1 – p), g = (4p – 2), h = 2(n + z²),

which may be substituted into (6) as

wcce – (zf + g + 1)
h
, and w+cce + (zfg + 1)
h
.
(6′)

In any case, for the last ten years or so I have been working with this formula. It exists in spreadsheets I give our students. But it has two obvious problems.

  1. It is not at all intuitive. How is Equation (6) related to Equation (2)? What is the difference between them? How was Equation (6) even derived?
  2. It is not decomposable. Which terms represent the continuity correction, and which the interval?

As we shall see, there are circumstances when we might wish to modify the variance and thus the width of the interval, but to not adjust the correction for continuity.

The finite population correction

Consider the finite population correction or ‘f.p.c.’. This is typically presented as an adjustment to standard deviation. See this post.

Finite population correction ν = √(Nn)/(N – 1).(7)

As the name implies, the finite population correction is applied to an interval or test when a sample is not drawn from an infinite population as the standard model assumes, but when it is drawn from one of a fixed size, N. In particular, it is relevant if the sample is a sizeable proportion of the population, say, 5%. Clearly if N >> n, then the finite population correction factor ν tends to 1, and has no effect.

To apply this adjustment to Equation (1) and (5), we can multiply the standard deviation term by ν.

Gaussian interval (E, E+) ≡ P ± zν√P(1 – P)/n. (1′)

and

Yates’s interval (Ecc, E+cc) ≡ P ± (zν√P(1 – P)/n + 12n).(5′)

This adjustment may also be applied to Equation (1). By inspecting (1′) we can see that rather than multiply the standard deviation by ν, we could also adjust the sample size, n′ = n/ν², and substitute n′ for n in each equation. We can now apply it to the uncorrected Wilson score interval, Equation (2). See below.

But we cannot use the same method with Equation (6), the continuity-corrected Wilson interval. To see why, first consider Equation (5). We need to adjust the standard deviation S, but not the continuity correction term, c = 12n.

Why do we not rescale c? Answer: because the entire point of a continuity correction is to overcome the ‘chunkiness’ of the original source Binomial distribution. See above. So we should not modify n in the formula for c. The original distribution is no less chunky! The interval is narrower because a finite population means we can be more certain.

To apply this correction to a χ² test, we can calculate the test in the normal way and divide the result by ν². This method works for the standard test or Yates’s version (Equation (4)).

Our task is to find a formula for (6) that separates out the scale of the standard deviation from the continuity correction term.

The solution

It turns out that the solution turns out to be extremely simple and intuitive. Indeed it is so simple and intuitive once you see it that it is rather surprising that papers do not simply give it in this form! (I suspect that this says more about a tendency for mathematical brevity on the one hand and the tendency for researchers to copy formulae rather than analyse and explain them from first principles.)

Aside: The route to a Eureka moment is not always very edifying. In my case, I could have kicked myself! After three days of struggling with algebraic reductions of Equation (6), I read back through Newcombe (1998) and his sources. Blyth and Still (1983) were also not very clear, but at least their paper reformulates Equations (2) and (6) differently. Then I remembered something. I had plotted Equation (6) when plotting the Wilson distribution. The corrected intervals began at p ± 12n. See Figure 2 below.

cpwdist
Figure 2. Continuity-corrected Wilson (red) alongside ordinary Wilson intervals (blue), reproduced from Wallis (2021). If we move p sideways by 12n in either direction and recompute the ordinary Wilson curve, we obtain the equivalent continuity-corrected interval. The Clopper-Pearson interval (inverted Binomial) is shown for comparison.

Here it is (drum roll please):

Let us use functions to define the interval bounds for the uncorrected interval (Equation (2)),

w = WilsonLower(p, n′, α/2),
w+ = WilsonUpper(p, n′, α/2),

where n′ = n/ν². Then we have

wcc = WilsonLower(p12n, n′, α/2),
w+cc = WilsonUpper(p + 12n, n′, α/2). (8)

That was not hard, was it?

Equation (8) solves our problem. The continuity correction term is added to the origin of the interval, p, first. Just as with Yates’s formula (4), we modify the variance in Equation (2) by rescaling n (hence we use n′ = n/ν²). But we retain c = 12n without rescaling n.

Note that when we apply a continuity correction to the population proportion P, we calculate the interval on the basis of P first and then add 12n second. But when we apply a continuity correction to the observed proportion p, we add it to p first, and then calculate the interval. This is logical, because the interval equality principle also applies to the continuity-corrected interval.

Finally, we should constrain the proportion parameter to the range [0, 1], but this is now very easy! This is the formula given in Wallis (2021: 145).

wcc = WilsonLower(max(p12n, 0), n′, α/2),
w+cc = WilsonUpper(min(p + 12n, 1), n′, α/2). (9)

Conclusions

Sometimes statisticians make life unnecessarily difficult for ourselves. The solution above is hinted at by Blyth, Still, and Newcombe, but it is certainly not presented in the simple way I have done above.

Secondly, it is rare to see a statistical discussion on correcting for continuity and finite populations at the same time. Corrections for continuity tend to be forgotten as soon as formulae become more complex or tables gain more dimensions. However the reasons for correcting for continuity have not suddenly disappeared! The source distribution is still ‘chunky’! As a general point, continuity corrections may be omitted from effect size estimates, but should be taken into account in significance testing or interval calculations.

Yet with care and consideration – and some first-principles mathematics – it is possible to apply corrections for continuity and finite population to the same formulae. Other corrections, such as cluster sampling corrections (in corpora, this is usually random text sampling), can also now be applied just as easily.

Given the proven improvements in reducing Type I errors that this adjustment involves, especially for small samples, I am increasingly of the view that we should apply continuity corrections whenever we carry out a significance test. Equation (2) may still be used for plotting purposes, but for comparing proportions we should employ Yates’s 2 × 2 test or the Newcombe-Wilson test with continuity correction (see Wallis 2013a, b).

This method is productive. In recent work, I have shown how increasing the weighting of the continuity correction term in Equation (9) permits us to reduce additional Type I errors introduced in calculating intervals for effect sizes and other properties calculated from algebraic formulae of independent proportions.

References

Blyth, C.R. & H.A. Still. 1983. Binomial Confidence Intervals. Journal of the American Statistical Association 78, 108-116.

Newcombe, R.G. 1998a. Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in Medicine 17, 857-872.

Wallis, S.A. 2013a. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics 20:3, 178-208. » Post

Wallis, S.A. 2013b. z-squared: the origin and application of χ². Journal of Quantitative Linguistics 20:4, 350-378. » Post

Wallis, S.A. 2021. Statistics in Corpus Linguistics Research. New York: Routledge. » Announcement

See also

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.