# Plotting the distributions of confidence intervals on algebraic operators on proportions

### Introduction

Elsewhere on this blog we have discussed the distribution of values predicted by confidence intervals, referred to more formally as probability density functions (pdfs).

When we plot confidence intervals, we determine the nearest point(s) to our observed value where we would expect the true population value to be and still be considered significantly different from it (at a given error level).

But we can also plot the distribution of the interval function by varying the error level α from 1 (every difference is significant) to almost zero (asymptotically: nothing is significant). We can then see which expected values are more likely than others given one further piece of information:

• The upper and lower bounds are computed independently, so the plot effectively assumes that there is equal chance of an observed property p being greater than the expected property P, as vice versa, except at the boundary, where one distribution becomes empty.

We can then assess the overall shape, observing how these values tail off, a process that is much more instructive than arguing about ‘p-values’.

We can also see something else.

Most traditional discussions of confidence intervals assume that intervals are approximately Normal, an assumption Wallis (2021: 297) calls the ‘Normal fallacy’. This conceptual error has dogged discussion of confidence intervals in the statistics literature, and deeply affects how people rationalise about intervals.

In my book, I point out that even the simplest interval about the single proportion p cannot be Normal. Instead we discover that it is profoundly shaped by the boundaries of the probabilistic range P = [0, 1]. Elsewhere on this blog, I have developed the implications of this argument and plotted more distributions.

I show the shape of the distribution for the Wilson score interval based on p (Wilson 1927) and other related distributions. Only when n is large and p central does this distribution approximate to the Normal.

In this blog post we will refer to the interval p ∈ (w, w+) in terms of a functional notation. Wallis (2021: 111) proposes two Wilson functions with three parameters.

lower bound w = WilsonLower(p, n, α/2) = p′ – e,
upper bound w+ = WilsonUpper(p, n, α/2) = p′ + e, (1)

where

p′ = p + zα/2²/2n
1 + zα/2²/n
,
and e′ = zα/2 p(1 – p)/n + zα/2    ²/4n²
1 + zα/2²/n
.

where n is the sample size, p = f/n is the observed proportion and α/2 is the error level for each tail. Each bound should be treated separately, although they converge at p where α = 1.

Wilson’s interval may be corrected for continuity to adjust for the fact that we are using a continuous function to approximate a discrete set of options (p ∈ {0, 1/n, 2/n… 1}). We can compute corrected intervals with

wcc= WilsonLower(max(0, p – 12n), n, α/2),
wcc+ = WilsonUpper(min(1, p + 12n), n, α/2). (2)

This simply moves p outwards by a continuity correction term c = 12n, and then computes the Wilson interval for that adjusted proportion.

Other formulae, such as the ‘exact’ Clopper-Pearson (CP) interval may be employed instead. When we plot the distribution of continuity-corrected or CP intervals we can see that the ‘Wilson distribution’ is two distributions: one for each bound (Wallis 2021: 308-310). This is why it is important to know whether p > P or vice versa.

Note: In what follows, readers may substitute an alternative formula, such as Jeffreys, CP, or bootstrap formulae for the confidence interval for p. We use the Wilson because it is superior to most, and its defects may be simply addressed through corrections for continuity, population size or random-text sampling.

In a new paper (Wallis forthcoming), I also show distributions for intervals on selected monotonic functions of p. These are the square, reciprocal, logarithm and logit projections on the Wilson interval. Apart from when p is 0 or 1, the logit Wilson distribution, exceptionally, is approximately Normal (see also Wallis 2021: 308).

We also consider non-monotonic functions. An example plot for such a function is found in Confidence intervals on goodness of fit ϕ scores.

The method for plotting these distributions can be summarised as follows.

• We take an observed proportion p, which optionally we may project with some function (fn(p)) onto a different scale (e.g. reciprocal 1/p ∈ [0, ∞]), choose an interval formula (Wilson, continuity-corrected Wilson, Clopper-Pearson, etc.) and compute the position of bounds for every error level α.
• This results in a cumulative distribution of scores which we convert into a probability distribution by a method termed delta approximation (Wallis 2021: 300, see also Plotting the Wilson distribution).

We are now ready to begin considering properties obtained by combining independent proportions.

#### Intervals on algebraic combinations of two or more proportions

The Newcombe-Wilson interval (Newcombe 1998) is a confidence interval on the difference between two observed proportions, d = p2p1, derived from the Wilson interval. Although it may be cited as an interval about zero (wd, wd+), such that d is tested against it, its most easily generalised form is simply as a range of differences, d ∈ (d, d+). This formula is well behaved and does not exceed [-1, 1]. I discussed the pdf distribution of this interval in Plotting the Newcombe-Wilson distribution.

In An algebra of intervals and Wallis (forthcoming) we generalise this method using Zou and Donner’s (2008) method to obtain intervals on sums of proportions such as s = p1 + p2, ratios r = p1 / p2 and products p = p1 × p2. In this blog post we discuss the distribution of these intervals, allowing us to see how they perform.

Zou and Donner’s method can be summarised by the following general schema. Suppose we have two independent parameters, which we call θˆ1 and θˆ2, with confidence intervals (l1, u1), (l2, u2) respectively, then the interval on the difference (L, U) may be computed by

(L, U) ≡ ˆ1 – θˆ2 – √ˆ1  – l1 )² + (u2  – θˆ2  )², θˆ1 – θˆ2 + √(u1  – θˆ1 )² + (θˆ2   – l2  )²). (3)

The rationale for this formula is that each of the squared terms, such as (θˆ1l1)2, is a scaled variance for the bound of the single interval, zα/22 × s2(l1). This formula then applies the sum of independence variance rule (also known as the Bienyamé formula) to the two parameters.

Armed with this formula, Zou and Donner argue that θˆ1 and θˆ2 need not be proportions (which is the case with the Newcombe-Wilson formula). They can be any independent parameter for which a high quality coverage confidence interval exists. Given that we can simply transform any interval on p to an interval on a monotonic function of p, this permits us to obtain formulae for sums, ratios and products.

### Difference between proportions d = p2 – p1

It is conventional to subtract p2 (the second observed proportion or rate in a pair) from p1 (the first). So with indexes reversed, we simply substitute the following into Equation (3).

• θˆ1 = p2, (l1, u1) = (w2, w2+), and
• θˆ2 = p1, (l2, u2) = (w1, w1+).

This obtains the Newcombe-Wilson interval about d, which is based on four variables: proportions p1 and p2, and sample sizes n1 and n2. We can then proceed to calculate the probability density function.

• Tip: Stefan Gries’ collocation measure ΔP is simply –d = p1p2. Armed with this information, computing an interval for ΔP is trivial.

In order to see the resulting function’s relationship to the Wilson distributions for p1 and p2 on the same scale, we reposition them on the difference scale, i.e. d1 = p2 – Wilson(p1) and d2 = Wilson(p2) – p1. (For simplicity, we can drop sample size and error level parameters.) These curves model a situation where all of the uncertainty occurs on a single term, e.g. for d1, we assume that we know the value of p2 but p1 is uncertain. See Figure 1.

In Plotting the Newcombe-Wilson distribution we also discuss a number of permutations of these curves, including where p1 or p2 = 0 or 1, small sample sizes, and the application of continuity correction. For reasons of space we will not repeat this here, but we will draw out comparisons.

### Sum of proportions s = p1 + p2

We can make the following substitutions into Equation (3) to obtain an interval for the sum. Negation is monotonic, but causes us to swap bounds.

• θˆ1 = p1, (l1, u1) = (w1, w1+), and
• θˆ2 = –p2, (l2, u2) = (–w2+, –w2).

The interval and distribution in Figure 2 is obtained with raw frequencies p1 = 0.2, p2 = 0.4 for the same sample sizes as Figure 1.

The resulting sum interval has a close relationship with the difference interval. Indeed, in this case it is a mirror image. Why? Well, the interval for θˆ2 = –p2 is the mirrored interval for p2. Another way of thinking about this is that the ‘shape’ of the θˆ2 interval and distribution is identical to that for q2 = 1 – p2 = 0.6, which is what we saw in Figure 1. So now the lower bound of the sum interval has the same width as the upper bound of the difference interval, and vice versa.

To position the Wilson intervals on the same scale, we simply substitute the Wilson term for each element (e.g. s1 = Wilson(p1) + p2). Figure 2. Distribution for the derived confidence interval for the sum of proportions s = p1 + p2, plus Wilson distributions centred at the same location.

We can compare this interval with an equivalent Gaussian ‘Normal approximation’ interval, just as we did with the difference interval. This interval is centred at s, but with a confidence interval based on the weighted mean prior probability estimate pˆ (a ‘standard error’ approach). This performs poorly and can overshoot the range. (The least said about this method the better: we have included it in the spreadsheet for anyone wishing to see how poorly it performs!)

#### Correcting for continuity

As we noted, an important consequence of adopting Wilson-based methods is that they can be readily corrected for continuity, small population size and random-text sampling.

Let us illustrate this by applying a continuity correction to Wilson score intervals using Equation (2). This adjusts for the fact that our sample is small, so observed frequencies can only be whole numbers.

Where α = 1, uncorrected Wilson bounds meet p, creating the impression of a single distribution rather than two distinct distributions, one for the lower and one for the upper bound. But following adjustment, the Wilson score intervals start at a distance ci = 12ni from p. See Plotting the Wilson distribution.

For our new summed Wilson distribution, the resulting continuity-corrected sum interval is separated from the observed sum s by the square root of the sum of these terms, c1 ² + c2 ², except where one proportion is at a boundary, where the sum reduces accordingly. The same logic applies here as it does to difference intervals, which we discussed previously. Figure 3. Distribution for the continuity-corrected summation interval for the sum of proportions s = p1 + p2, alongside the uncorrected distribution and the two corrected Wilson score distributions for reference.

We can perform two further steps in our review. We examine what happens to these pdf distribution curves when p1 or p2 approaches zero or 1, and the effect of keeping s constant while changing proportions. We also consider the effect of small samples.

#### Moving s and skewed p

For the same reasons that we discussed in the case of the difference interval, it is important to note that the ‘interval for the sum’ is not merely based on the sum s. It is the result of four variables: proportions p1 and p2, and corresponding sample sizes n1 and n2. First, let us use an animation to observe what happens if we maintain a constant difference between p1 and p2, say 0.4, and permute p1 from 0 to 0.6. Animation 1. Varying proportions p1 and p2 with a constant difference d between them.

We can see that at the extreme proportions, p1 = 0 and p2 = 1, the outer Wilson interval disappears. This is by design, as no proportion can possibly exceed the range [0, 1]. The outer interval width must become zero. Where a squared difference term in Equation (3) becomes zero, the resulting interval simply becomes equal to the remaining interval.

#### Small n

As you would expect, if we draw data from small samples we increase uncertainty. The effect of reducing one of the samples to, say, n2 = 2, is essentially the same as we find with the Newcombe-Wilson difference interval  distribution. See Figure 5 in Plotting the Newcombe-Wilson distribution, and feel free to experiment with the spreadsheet.

Since we are performing a very similar algebraic operation with our intervals, this should not be surprising.

### Ratio of proportions r = p1 / p2

Zou and Donner (2008) illustrate their theorem (Equation (3)) by showing how it can be used to compute an interval for the ratio of two proportions, which they also call the ‘risk ratio’. This formula has other applications, for example to efficiently obtain an accurate interval for percentage difference, or – by applying functions, to compute the odds ratio, log odds ratio, etc. So it is important to understand its performance!

Zou and Donner apply Equation (3) to the difference of the logarithm of proportions, ln(p1) and ln(p2). Since a difference in logs is the log of the ratio, we can substitute as follows:

• θˆ1 = ln(p1), (l1, u1) = (ln(w1), ln(w1+)), and
• θˆ2 = ln(p2), (l2, u2) = (ln(w2), ln(w2+)).

This allows us to compute an interval for θˆ1 – θˆ2 = ln(p1) – ln(p2) = ln(p1/p2) = ln(r), in other words, the confidence interval for the ratio expressed on a log scale. The final stage is to convert the same interval to the ratio r scale by applying the inverse logarithm (‘exp’) function. See An algebra of intervals.

Let’s see how this works in practice. First, we will plot the distribution curves on the log(ratio) scale. The ratio curve is most similar to the numerator p1 curve. Subtracting a narrow denominator ln(r2) confidence interval has a limited impact on the overall variation.

We can compute two additional distributions on the same scale. As before, these distributions assume all of the observed variation is found with a single parameter, and the other is constant. We define a numerator distribution, r1, which assumes all of the variation is in the observed proportion p1 (we might say n2 → ∞). The denominator distribution, r2, does the same for an assumption that p1 is a constant.

numerator r1 = Wilson(p1)/p2,
denominator r2 = p1/Wilson(p2). (4)

In Figure 4, we plot r1, r2 and r on the same (natural logarithm) scale. Note that thanks to the reciprocal function, the polarity of the r2 interval is reversed, i.e. (r2, r2+) = (p1/w2+, p1/w2).

In the spreadsheet, we perform these calculations via subtraction of logs, so that (for example) ln(r1) = ln(Wilson(p1)) – ln(p2). Since we must calculate the logarithms of these terms to obtain the interval for r, this ‘long way round’ is not as arduous as it might seem! Figure 4. Distribution for the log ratio interval computed by subtraction on the log scale, ln(p1) – ln(p2), and repositioned log Wilson intervals on the same scale.

Next, we can plot these distributions on the ratio scale by delta approximation, taking the exponent of r, r1 and r2 in Figure 4 for varying α.

Note. We don’t merely apply the ‘exp’ transform to the scale, which would make areas under the curve unequal. We calculate interval (x) positions on this scale by the ‘exp’ transform, and then calculate the y position at each point, sufficient to include the area under the curve.

Alongside the distribution for ratio r, we compute the transformed Wilson distributions r1 and r2. This gives us Figure 5. Figure 5. Ratio interval distributions obtained by Zou and Donner’s method, with delta approximation on the ratio scale.

Most of the distributions obtained by this method have a long upper tail, because a ratio of two proportions r ∈ [0, ∞]. We can see that the resulting distribution is well behaved. By substituting n1 or n2 = 10,000 (for example) we can confirm that the r curve converges on r1 and r2 respectively.

#### Skewed p1 and p2

Where p1 and p2 are 0 or 1, the situation is more complex with division than the other operators due to the fundamentally unequal nature of ratio terms.

Numerator:

• If p1 = 0, then its Wilson score interval has only an upper bound.

Note. We use a tiny delta to make tiny proportions computable: for plotting we find that Excel™ obtains smoother curves with an adjustment of p1 = δp = 10-6 (this delta is not to be confused with that used for delta approximation). This adjustment is not required if a continuity correction is employed, as p1 would then equal 12n1.

However this approximation does not work when both p1 and p2 are zero, and the interval is spread over the entire range [0, ∞].

• If p1 = 1, then its interval has only a lower bound. The upper bound curve r+ = r2+.

We can illustrate this by taking sequential values for p1 from 0 to 1 and animating the result. Animation 2. Permuting the distribution for Zou and Donner’s risk ratio interval from p1 = 0 to 1, with p2 = 1.

Denominator:

• If p2 = 0, then the ratio r → ∞, so r+ is also infinite. With the substitution of (e.g.) p2 = δp or 12n2 we can estimate a lower bound distribution r, but the distribution is stretched, and so is not easily plotted! Nonetheless, for p1 > 0, rr2 (p1/w2+), which can be computed for w2+ > 0, and is a good substitute. See Figure 6.
• If p2 = 1, its transformed Wilson interval has no lower bound (the interval for p2 having no upper bound). See Animation 2 above. This also means that the lower bound of r = r1. Figure 6. Lower bound of the ratio interval for r– and r2– as r → ∞ and p2 → 0. Since p2 = 0 obtains an uncomputable interval, we substitute a small delta – which explains the visible difference.

The final animation in this section explores what happens if the denominator proportion p2 is varied. Changing the denominator has a dramatic effect on the shape of the distribution, as we might expect. At p2 ≅ 0, the interval is infinite at the upper bound, and the lower bound distribution is stretched, as in Figure 6. As p2 increases, the fraction shrinks and the distributions become increasingly bunched around it. Animation 3. Permuting the value of the denominator, p2, from 0 to 1, risk ratio interval distributions with the same numerator p1 = 0.5.

#### Employing a continuity correction

As previously, we apply the continuity corrected formula (Equation (2)) as input to Equation (3), in this case with the log substitutions for the ratio.

Figure 7 plots component intervals for the same observations as Figure 5, with the uncorrected interval for reference. We can see that the distributions for transformed Wilson intervals (r1, r2) and combined interval (r) are more conservative, with bounds pushed further out.

Employing a continuity correction avoids the problem of extreme proportions. Where  p = 0 or 1, Equation (2) moves the Wilson bound 12n away from p, towards the centre of the distribution. Figure 7. The effect of applying Yates’ correction for continuity to the distributions in Figure 5. The overall distribution and bounds are expanded outwards, the result of combining continuity-corrected Wilson intervals for each proportion.

### Product of proportions m = p1 × p2

The method for computing intervals for the product of independent proportions is very similar to that for the ratio. See also the sum formula above. The product m ∈ [0, 1]. (We will use m, for ‘multiple’, to refer to the product, to avoid confusion with proportions.)

We can substitute as follows into Equation (3).

• θˆ1 = ln(p1), (l1, u1) = (ln(w1), ln(w1+)),
• θˆ2 = –ln(p2), (l2, u2) = (–ln(w2+), –ln(w2)).

In these graphs we will also plot scaled Wilson intervals m1 and m2.

m1 = Wilson(p1) × p2,
m2 = p1 × Wilson(p2) (4)

First let us consider the log product distributions. Figure 8. An example series of log product distributions obtained by applying Zou and Donner’s theorem to two proportions on a logarithmic scale.

As an observation, the product interval tends to be closer to the interval for the smaller of the two proportions since the product of two proportions is smaller than both. In a simple product, m1 and m2 are interchangeable (like addition, the product is a symmetric relation).

As before, once combined we remove the logarithm with the ‘exp’ function and compute the resulting interval distributions on the product scale. The same kind of relationship with the source Wilson distributions can be seen. Figure 9. Plotting distributions of products, m = p1 × p2, by delta approximation.

#### Skewed p1 and p2

If p1 = 0, then m → 0, the upper bound m+m1+. Likewise, if p2 = 0, m → 0 and m+m2+. However if both p1 and p2 = 0, then this substitution is not possible and we must approximate it with p1 = p2 = δp

If p1 = 1, then its interval has only a lower bound, so m+ = m2+. Similarly if p2 = 1, m+ = m1+.

The animation below illustrates this principle. Animation 4. Distributions of intervals for the product of proportions, varying one proportion (p1) and holding p2 at a constant 0.5.

#### Employing a continuity correction

We can substitute the continuity-corrected Wilson intervals (Equation (2)), perform log summation by Zou and Donner’s method (Equation (3)) and then project the result. We illustrate this in Figure 10.

As with the ratio interval, performing this adjustment also has the benefit of addressing issues arising in the case of skewed proportions.

### Conclusions

In the past we have computed distributions of intervals for independent proportions robustly using the delta approximation method, and applied this method to a variety of interval functions, notably but not exclusively, the Wilson score interval, with and without continuity correction. In previous posts we plotted the Wilson distribution and the Newcombe-Wilson distribution – the latter being the pdf of the difference interval calculated using Newcombe’s method. In this post we extend the method for the difference to three further algebraic operators – addition, division and multiplication.

We discover that the distribution for the sum of two proportions is as robust as that for the difference in proportions. Difference d ∈ [-1, 1] obtains a symmetric distribution centred at 0 if p1 = p2. On the other hand sum s ∈ [0, 2] has a condition for symmetry where p1 + p2 = 1. In other cases we can obtain ‘mirror distributions’ – distributions for the sum which precisely mirror an equivalent distribution for the difference – because the Wilson interval for an alternate q = 1 – p is the reflection of that for p.

The situation is a little more complex when it comes to ratios and products. It also raises two issues. First, the method requires us to first transform terms to the logarithmic scale. Second, zero proportions present a particular problem unless a continuity correction is applied. Zou and Donner’s (2008) theorem (Equation (3)) relies on independent properties (with independent variances) and intervals with good coverage properties.

Yates’ continuity correction is frequently ignored in more complex statistical methods, but there is rarely a good reason to do so. For example, corrections tend to be left out of meta-tests (Wallis 2021: 238), but as I point out, the source sample has not become less ‘chunky’!

Exploring cases where proportions are 0 or 1 we observe some useful simplifications for ratio and product formulae.

For either product term or the ratio numerator p1, the result will be multiplied by that proportion, obtaining a zero result. Importantly however, the interval for a zero proportion is not zero-width (certain).

Thus for the product, the upper bound for the multiple m = p1 × p2 tends to the bound of the other term if one proportion is zero. So if p1 = 0, m+ = m2+. The same principle applies, for p1 only, to the ratio r. If the denominator p2 = 0 then the ratio and its upper bound is of course infinite, but the lower bound r tends to r1 = p1/w2+.

The product and ratio methods require that the properties concerned are positive and greater than zero. Indeed the logarithm of zero is infinite, which causes us to substitute a small delta term δp = 10-6 in cases for zero proportions. This method allows us to render distributions for uncorrected intervals computable. Zou and Donner (2008: 1697) suggest substituting the continuity correction term, but only in extreme cases, which is an inconsistent approach.

Finally, as I have previously commented in discussing these types of plot, very few instances of these distributions look remotely Normal (Gaussian). As sample sizes increase (and extreme proportions are avoided) these plots may tend towards something more bell-curved in shape, but an expectation that such distributions will be ‘approximately Normal’ is liable to be misleading.

### References

Newcombe, R.G. (1998). Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine17, 873-890.

Wallis, S.A. (2021). Statistics in Corpus Linguistics Research. New York: Routledge. » Announcement

Wallis, S.A. (2021, forthcoming). Accurate confidence intervals on Binomial proportions, functions of proportions and other related scores. » Post

Wilson, E.B. 1927. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209-212.

Zou, G.Y. & A. Donner (2008). Construction of confidence limits about effect measures: A general approach. Statistics in Medicine27:10, 1693-1702.