In this blog we identify efficient methods for computing confidence intervals for many properties.
When we observe any measure from sampled data, we do so in order to estimate the most likely value in the population of data – ‘the real world’, as it were – from which our data was sampled. This is subject to a small number of assumptions (the sample is randomly drawn without bias, for example). But this observed value is merely the best estimate we have, on the information available. Were we to repeat our experiment, sample new data and remeasure the property, we would probably obtain a different result.
A confidence interval is the range of values in which we predict that the true value in the population will likely be, based on our observed best estimate and other properties of the sample, subject to a certain acceptable level of error, say, 5% or 1%.
A confidence interval is like a blur in a photograph. We know where a feature of an object is, but it may be blurry. With more data, better lenses, a greater focus and longer exposure times, the blur reduces.
In order to make the reader’s task a little easier, I have summarised the main methods for calculating confidence intervals here. If the property you are interested in is not explicitly listed here, it may be found in other linked posts.
1. Binomial proportion p
The following methods for obtaining the confidence interval for a Binomial proportion have high performance.
- The Clopper-Pearson interval
- The Wilson score interval
- The Wilson score interval with continuity correction
A Binomial proportion, p ∈ [0, 1], and represents the proportion of instances of a particular type of linguistic event, which we might call A, in a random sample of interchangeable events of either A or B. In corpus linguistics this means we need to be confident (as far as it is possible) that all instances of an event in our sample can genuinely alternate (all cases of A may be B and vice-versa).
These confidence intervals express the range of values where a possible population value, P, is not significantly different from the observed value p at a given error level α. This means that they are a visual manifestation of a simple significance test, where all points beyond the interval are considered significantly different from the observed value p. The difference between the intervals is due to the significance test they are derived from (respectively: Binomial test, Normal z test, z test with continuity correction).
As well as my book, Wallis (2021), a good place to start reading is Wallis (2013), Binomial confidence intervals and contingency tests.
The ‘exact’ Clopper-Pearson interval is obtained by a search procedure from the Binomial distribution. As a result, it is not easily generalised to larger sample sizes. Usually a better option is to employ the Wilson score interval (Wilson 1927), which inverts the Normal approximation to the Binomial and can be calculated by a formula. This interval may also accept a continuity correction and other adjustments for properties of the sample.