In a previous post I showed how to derive intervals for the power and log operators for two uncertain parameters, e.g. g = p1p2, ‘p1 to the power of p2’, and l = logp2(p1), ‘log of p1 to base p2’.
With two uncertain (observed) proportions, or functions of them, we may use Zou and Donner’s (2008) method for the difference of two parameters. If one parameter is known then the interval is simple to obtain, e.g. p1k is a monotonic function of p1, and falls within probabilistic bounds.
Zou and Donner’s theorem can be written as
(L, U) ≡ (θˆ1 – θˆ2 – √ , θˆ1 – θˆ2 + √ ), (1)
We can employ this theorem to obtain intervals for mathematical operators by identifying substitutions for θˆ1 and θˆ2, and transforming the result back to the original scale.
Elsewhere I identified error rates for the inner interval generated by subtraction, division and log operators by evaluation against the Fisher test. This is a strict ‘practioner’ evaluation, but it is objective, and draws out the differences obtained by generalisation over three different scales. In brief, these methods are viable and generally quite accurate, but introduce small errors due to the particular number scale that the interval calculation is performed on.
In this blog post I am concerned to plot the distributions of power and log intervals for two Binomial proportions. The method is the same as we used for difference (Newcombe-Wilson) intervals, which we then applied to addition, multiplication and division.
From intervals to distributions
I’ll briefly discuss the method here for the benefit of a casual reader, but for more serious study I would suggest you read the earlier posts.
A confidence interval for a variable expresses its likely range of true values, based on observed data. As you would expect, typically this interval range includes an area on either side of your observation unless your observation is at (or near) a numerical range limit like zero on the probability scale. Since a proportion cannot be less than zero, the true value cannot be less than zero.
To calculate a confidence interval we need to specify an error level, α. Often in experimental statistics we set α at 0.05 or 0.01 but some like to cite smaller error levels. This involves a conceptual error, but that is another story.
Enter the distribution. A probability density function (pdf) distribution plots the predicted probability that any value of the variable is selected. Perhaps the most well-known pdf is the familiar bell-shaped Gaussian (Normal) distribution. But this is not the only possible shape, as we shall see!
With a delta approximation method we can convert an interval function, which may be rendered as a cumulative distribution function (‘cdf’), into an interval distribution (‘pdf’). See Wallis (2021: 300), Plotting the Wilson distribution and this post. We should consider the upper and lower bounds separately because (strictly speaking) these distributions presume that we know if the true value is less than or greater than the observed one.
Power g = p1p2
To compute an interval for the power function, we substitute terms into Equation (1) from the following.
- θˆ1 = ln(–ln(p1)), (l1, u1) = (ln(–ln(w1+)), ln(–ln(w1–))), and (2)
- θˆ2 = –ln(p2), (l2, u2) = (–ln(w2+), –ln(w2–)).
where ln is the natural logarithm and (wi–, wi+) are the Wilson score interval bounds for each proportion pi. Note how both intervals are reversed (the lower bound is based on wi+, the upper on wi–) due to the minus sign in each case.
Equation (3) transforms the result back to the original scale.
g = exp[–exp[θˆ1 – θˆ2]]. (3)
We will plot distributions as a two-stage process, just as we did for the product. Indeed Equation (2) is very similar to that for the product, except that the first parameter, θˆ1, is the log of a negated log score (this makes it computable). The difference is negated, so we insert a negative sign in Equation (3).
The function exp(-exp(x)) is negatively monotonic. This means that the function consistently decreases if its parameter, x, increases. If we apply such a function to a confidence interval for x, the bounds will swap position – what was low becomes high, and vice versa. So having obtained L and U from Equation (1), we find U < L, and we quote the interval this way around (U, L).
On the natural log-log power scale this formula might be written in shorthand as
ln(–ln(g)) = ln(–ln(Wilson(p1))) + ln(Wilson(p2)),
where Wilson(p) represents a Wilson score interval on either side of the Binomial proportion p.
As with the product and ratio intervals, to see how the two components are combined we may also plot the distributions of intervals g1 and g2, each of which assume that only one of the variables is uncertain. We may write
mantissa g1 = Wilson(p1)p2, and
exponent g2 = p1Wilson(p2).
In the case of g2 the interval bounds are reversed because the function with p1 less than or equal to 1 is negatively monotonic. On the log-log scale these expressions become, simply
ln(–ln(g1)) = ln(–ln(Wilson(p1))) + ln(p2), and
ln(–ln(g2)) = ln(–ln(p1)) + ln(Wilson(p2)).
We will start by plotting with n1 = n2 = 10 and selected values of p1 and p2. As you would expect, the interval for g is wider (and the distribution flatter) than either interval for g1 or g2. The (negative) log Wilson(p1) and Wilson(p2) may appear to adopt a similar shape on this scale.
As a final step we translate the intervals to the Real scale. We compute exp(–exp(x)) for every x position in Figure 1, and then recalculate the pdf on this new scale. The upper and lower bounds are reversed because the translation is negatively monotonic. The power interval for g = 0.10.5, observed with sample sizes of 10 for both parameters, is
g = p1p2 = 0.3162 ∈ (0.0983, 0.7059).
If we know one of the parameters is true we obtain, variously
g1 = p10.5 = 0.3162 ∈ (0.1337, 0.6357), and
g2 = 0.1p2 = 0.3162 ∈ (0.1742, 0.5800).
The latter is negatively monotonic, so the bounds are swapped. Translated distributions are shown in Figure 2.
Correcting for continuity
One of the strengths of this method is that it is straightforward to incorporate continuity and sampling corrections to the intervals using the same procedure.
For example, we can employ continuity-corrected Wilson score intervals for p1 and p2, and perform the computation again. The resulting intervals are wider. The dashed lines where the distributions commence are scores for intervals where α = 1, and the region in the middle close to g is assumed to be inseparable from that observation due to the fact that data is drawn from discrete fractions of n = 10. See (Wallis 2021: 308).
Continuity-corrected interval distributions reinforce the comment that these distributions are two separate intervals on either side of g, and despite appearances to the contrary, are not a single continuous distribution.
Plotting this distribution allows us to explore what happens where observed proportions tend to either 0 or 1. Since log(0) is –∞, whenever either proportion goes to zero, we must substitute a small delta, which we denote as ‘δp’ to avoid confusion with the delta employed in the computation of pdf distributions.
- If the mantissa p1 → 0 or 1, we obtain intervals with no outer bound, which tend towards the middle of the interval range (see Animation 1 above).
- Any number to the power of 0 is 1. If the exponent p2 → 0, we obtain an interval with a lower bound and the upper bound at 1. Where p1 and p2 are both zero we obtain a very broad interval based at 1 (almost over the entire range).
- Where the exponent p2 → 1, the contribution of g2 is one-sided.
These intervals are also stable with small samples. Employing a continuity correction broadens the intervals further still.
Logarithm l = logp2(p1)
Secondly, we consider the logarithm function with a probabilistic base, the log of p1 to the base p2,
l = logp2(p1).
This is equal to the ratio of two logs, l = ln(p1)/ln(p2). In an earlier post we observed that in order to compute the interval for l using the ratio formula, it was first necessary to negate both parameters:
l = –ln(p1)/–ln(p2), (4)
The interval for the ratio is the difference between two logs. To employ Equation (1) to calculate a ratio, we substitute log terms (outer ‘ln’) in the terms below.
- θˆ1 = ln(–ln(p1)), (l1, u1) = (ln(–ln(w1+)), ln(–ln(w1–))), and (5)
- θˆ2 = ln(–ln(p2)), (l2, u2) = (ln(–ln(w2+)), ln(–ln(w2–))).
The function ln(–ln(x)) is negatively monotonic due to the minus sign, so bounds are swapped for both parameters. Equation (5) is substituted into Equation (1) to obtain a difference on a log scale, which we correct simply:
l = exp[θˆ1 – θˆ2]. (6)
We can now proceed to compute distributions of this interval on the log and Real scale. As previously, for comparison purposes we will also plot distributions of interval functions where a single parameter is uncertain. These distributions may be thought of as components of the resulting distribution.
argument l1 = –ln(Wilson(p1)) / –ln(p2),
base l2 = –ln(p1) / –ln(Wilson(p2)).
On the log scale these distributions may be computed by
ln(l1) = ln(–ln(Wilson(p1))) – ln(–ln(p2)) and
ln(l2) = ln(–ln(p1)) – ln(–ln(Wilson(p2))),
the latter with reversed bounds.
With n1 = n2 = 10 and non-extreme proportions p1 and p2, we obtain a set of distributions similar to those shown in Figure 4.
The distributions for the intervals for ln(l) appear approximately Normal, except where p1 or p2 are at extremes. It is very nearly symmetric, being slightly wider on the outer side away from the zero origin on the log scale. We return to this issue in the conclusions below.
We can now plot the same series of distributions on the Real scale by applying the exp(x) function to interval positions x and performing delta approximation again. See Figure 5.
With α = 0.05 we obtain
l = logp2(p1) = 3.2193 ∈ (1.0149, 9.9382).
On the other hand if one of the observations are given we have
l1 = log0.5(p1) = 3.2193 ∈ (1.3070, 5.8058), and
l2 = logp2(0.1) = 3.2193 ∈ (1.5975, 8.5292),
the uncertain base interval (l2) having reversed bounds.
We can also plot distributions for continuity-corrected intervals on the same scale.
Extreme p and small n
The overall pattern is similar for other values of p1 and p2, provided that they are not at 0 or 1. However, if we consider extreme values for p1 and p2, we find some apparently eccentric behaviour.
- Where either p1 or p2 tend to zero we obtain what appears to be a bimodal distribution for the corresponding independent interval (log Wilson), which yields a similar overall outcome. More correctly, these distribution pairs have independent modes. See Figure 6.
- If the numerator, p1 → 1, the result is a narrow skewed distribution based on (and near) log(p1), i.e. ~0.
- Where the denominator p2 → 1, the result tends to a broadly-spread interval centred on 1/log(p2).
Although these distributions are unusual, they do not cause the method to fail. Again, there are no notable issues with small samples that continuity corrections cannot address.
Previously, we plotted confidence intervals on sum, product and ratio intervals of two independent observed proportions, adding to our work on Newcombe-Wilson difference intervals. In this post we extended our repertoire to power and log intervals of proportions.
We have updated our spreadsheet for this purpose so you can experiment. You may need to modify axis limits to see some curves however.
Nonetheless we can show that these methods are robust where p1, p2 ∈ (0, 1). We employ a slight ‘δp’ adjustment with extreme values, but we can also use a correction for continuity in either case. This observation is consistent with what we learned about the product and ratio.
Except at extremes, the ratio-based logarithm interval appears almost Normal on a log scale. Figure 4 echoes the apparent symmetry found in the figure reproduced below. So you might think that perhaps we could obtain an interval of comparable performance by a simpler Gaussian method.
But appearances can be misleading. In fact, the upper and lower interval widths are not quite equal.
Across the diagonal (above) the outer interval is slightly wider than the inner (for this data, at most 0.85%, but it can rise up to 3% or so). Although these differences may appear small, the actual performance error is in proportion to the area under the curve.
When I evaluated the logarithm against the Fisher test, I first forgot to account for the fact that –ln(x) is negatively monotonic. I neglected to swap the interval bounds in Equation (5), and so obtained a logarithm interval about ln(l) with its widths reversed. This generated both Type I and Type II errors. So we can be sure that a Normal approximation to this interval would have a performance cost.
Zou, G.Y. & A. Donner (2008). Construction of confidence limits about effect measures: A general approach. Statistics in Medicine, 27:10, 1693-1702.