Reciprocating the Wilson interval


How can we calculate confidence intervals on a property like sentence length (as measured by the number of words per sentence)?

You might want to do this to find out whether or not, say, spoken utterances consist of shorter or longer sentences than those found in writing.

The problem is that the average number of words per sentence is not a probability. If you think about it, this ratio will (obviously) equal or exceed 1. So methods for calculating intervals on probabilities won’t work without recalibration.

Aside: You are most likely to hit this type of problem if you want to plot a graph of some non-probabilistic property, or you wish to cite a property with an upper and lower bound for some reason. Sometimes expressing something as a probability does not seem natural. However, it is a good discipline to think in terms of probabilities, and to convert your hypotheses into hypotheses about probabilities as far as possible. As we shall see, this is exactly what you have to do to apply the Wilson score interval.

Note also that just because you want to calculate confidence intervals on a property, you also have to consider whether the property is freely varying when expressed as a probability.

The Wilson score interval (w⁻, w⁺), is a robust method for computing confidence intervals about probabilistic observations p.

Elsewhere we saw that the Wilson score interval obtained an accurate approximation to the ‘exact’ Binomial interval based on an observed probability p, obtained by search. It is also well-constrained, so that neither upper nor lower bound can exceed the probabilistic range [0, 1].

But the Wilson interval is based on a probability. In this post we discuss how this method can be used for other quantities.

Reciprocating the Wilson interval

Let us return to our initial question. How might we calculate confidence intervals on a property like the number of words per sentence?

Let’s call the length l. In this case the ‘trick’ is to take the reciprocal of the property (p = 1/l), which is a probability p. We are able to calculate Wilson intervals on “the number of sentences per word”, or, perhaps more meaningfully, the proportion of all words which are initial words in sentences.

If this sounds a bit odd, consider the following.

Suppose there are l = 10 words in a sentence.

The probability of selecting the first word in the sentence at random (assuming everything else is equal), p = 1/l = 1/10.

We can calculate the Wilson score interval for p as (w⁻, w⁺).

The confidence interval for l = 1/p is simply (1/w⁺, 1/w⁻).

The inverse function of the reciprocal is also the reciprocal, i.e. if p = 1/l, then l = 1/p.

This method works because of an important property of the reciprocal function (1/p). It is monotonic, which means that it either always increases as p increases, or always decreases as p increases. (Since the reciprocal function actually gets smaller with increasing p, we swap the interval bounds around so the smaller number is stated first.)

We return to what this means in more detail below.

Some example data

The following data was taken from ICE-GB using ICECUP. We have three data columns: number of parse units (parsed ‘sentences’) per subcorpus, number of clauses and number of words. We also have two ratio columns: the number of words per parse unit and the number of words per clause.

parse units clauses words l = words/PU words/CL
dialogue 43,894 57,161 374,516 8.5323 6.5519
mixed 2,443 5,648 43,632 17.8600 7.7252
monologue 13,133 27,613 225,184 17.1464 8.1550
spoken 59,470 90,422 643,332 10.8178 7.1148
non-printed 6,836 14,007 114,362 16.7294 8.1646
printed 17,099 40,750 359,634 21.0325 8.8254
written 23,935 54,757 473,996 19.8035 8.6564
TOTAL 83,405 145,179 1,117,328 13.3964 7.6962

Raw frequencies and ratios for the number of words per sentence and clause in ICE-GB subcorpora. Raw data and calculations are in this Excel spreadsheet.

Let us now compute confidence intervals on l, the words/PU column, with an error level α = 0.05. To do this, take the reciprocal, i.e. p = 1/l = PUs/word, and n = number of words.

p n z²/n p’ z.s’ w w
dialogue 0.1172 374,516 0.0000 0.1172 0.0010 0.1162 0.1182
mixed 0.0560 43,632 0.0001 0.0560 0.0022 0.0539 0.0582
monologue 0.0583 225,184 0.0000 0.0583 0.0010 0.0574 0.0593
spoken 0.0924 643,332 0.0000 0.0924 0.0007 0.0917 0.0932
non-printed 0.0598 114,362 0.0000 0.0598 0.0014 0.0584 0.0612
printed 0.0475 359,634 0.0000 0.0476 0.0007 0.0469 0.0482
written 0.0505 473,996 0.0000 0.0505 0.0006 0.0499 0.0511
TOTAL 0.0746 1,117,328 0.0000 0.0746 0.0005 0.0742 0.0751

Calculation of Wilson score intervals for 1/(words/PU) = parse units per word.

The interval for p, the number of parse units per word, is not what we wanted, but it is a necessary intermediate step.

We can now take the reciprocal of this interval, l = 1/p, to get back to where we started, and plot the graph.

words/PU 1/w 1/w
dialogue 8.5323 8.6077 8.4577
mixed 17.8600 18.5623 17.1858
monologue 17.1464 17.4335 16.8644
spoken 10.8178 10.9009 10.7353
non-printed 16.7294 17.1186 16.3495
printed 21.0325 21.3425 20.7271
written 19.8035 20.0495 19.5606
TOTAL 13.3964 13.4842 13.3093

Computing the reciprocal of the Wilson interval.

Note that because l = 1/p declines with increasing p, 1/w⁺ is less than 1/w⁻. The inverted interval is (1/w⁺, 1/w⁻).

This is what the graph looks like with intervals added.

Ratio of number of words to parse unit (‘sentence’) in ICE-GB subcorpora, with inverse Wilson score intervals. (Note that we have cropped the y axis so it does not start at zero.)

The interpretation of overlapping intervals on this graph is exactly the same as for standard Wilson score interval graphs:

  • non-overlapping intervals = significant difference,
  • overlapping central point = non-significant difference, and
  • for everything else, carry out a Newcombe-Wilson test on p = 1/l.

We can check to see if the ratio of words to parse units in the “mixed” and “monologue” subcorpora are significantly different using a Newcombe-Wilson test. In this case the difference is non-significant at the α = 0.05 level.

In the first table above we also included the ratio of words per clause (CL). To avoid repetition, we have not presented the corresponding calculation and graph here, but it is included in an Excel spreadsheet containing the raw data.


We can prove a useful general theorem which allows us to use the Wilson interval for properties other than probabilities.

For any function of p, f(p), that is monotonic over the range of p ∈ [0, 1], the Wilson interval for f(p) is

Wilson (f(p)) ≡ (f(w⁻), f(w⁺)) if f increases with p or
Wilson (f(p)) ≡ (f(w⁺), f(w⁻)) otherwise.

Note: The term monotonic means that the function always increases with its parameter (the gradient d(f(x))/dx > 0) or always decreases with its parameter (the gradient < 0).

The slope of a sloping roof is monotonic. The top of a roof (or a flat roof) is not! The gradient (slope) of a monotonic function may change, but it may not become horizontal or change direction from positive to negative (or vice-versa).

Other example monotonic functions include any constant multiple of p, e.g. 5p (a score on a scale from 0 to 5), the alternate probability q = 1 – p, any power of p such as p², and so on. (For more example transformations see this PDF ‘cheat sheet’.)

The function must always behave in this monotonic way over the probability range (it doesn’t matter what it does for p < 0 or p > 1). For example, across all values of x, x² is non-monotonic. However, as long as p ∈ [0, 1], p² is monotonic.

Some monotonic functions, including p² and 1/p. Note that a function with a negative gradient, such as 1/p, will flip the upper and lower bounds.

Importantly, the logistic function (that defines an ‘S’ curve) is monotonic. See this short paper for more on the relationship between the Wilson score interval and the logistic function.

All monotonic functions can be inverted and obtain a single solution, and this inverse is also monotonic. As a result we can compute an interval on a monotonic function of p by simply computing the interval on p and then apply the inverse of the function to the new interval.

Note that even though 1/p is infinite when p = 0, it is still possible to apply the Wilson score interval to p and report the reciprocal of the bounds.

By way of comparison, here are two examples of non-monotonic functions.

Two non-monotonic functions. In the lower curve, f(p) = (p – 0.5)², different values of p obtain the same value of f(p). The upper stepped function includes a plateau where all values in the range 0.25 < p < 0.75 obtain the same value of f(p).

See also

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.