So: you’ve got some data, you’ve read up on confidence intervals and you’re convinced. Your data is a small sample from a large/infinite population (all of contemporary US English, say), and therefore you need to estimate the error in every observation. You’d like to plot a pretty graph like the one below, but you don’t know where to start.

Of course this graph is not just pretty.

It depicts a pattern whereby two synchronically distinct uses become four over time. Note that for any pair of points across a diachronic contrast we can immediately identify the following:

- non-overlapping intervals
*must*be statistically distinct (a significant difference); - if any point falls within the interval of another, it
*cannot be*significantly distinct (a non-significant difference); - in all other cases we need to carry out a 2 × 2 test (either Yates’ χ² test or the Newcombe-Wilson continuity-corrected test) to check.

Probabilities drawn from the **same sample** sum to 1. To compare points in competition synchronically you should use a single sample *z* test instead of a 2 × 2 test.

In this case, the 1960s data does not significantly distinguish the rates for quotative and interpretive uses of *think* because the *p* value for the quotative use is within the interval for the interpretive. A quick check with the 2 × 2 spreadsheet finds that

- the initial fall (from 1920s to 1960s) for ‘cogitate’ uses is significant but
- intention does not change significantly over time (note how the curved line can be misleading), and
- quotative uses significantly increase their share from the 1960s to 2000s.

Other changes can be easily identified in the graph. Note that this graph expresses **semasiological** (meaning distribution) change, not **onomasiological** (choice of alternates) change, and results are therefore indicative. We can’t conclude that speakers in later texts increasingly preferred to employ *think* in a quotative way, without considering this question *relative to the opportunity to employ quotative constructions*. See Choice vs. use.

Let us discuss how we arrived at this graph.

### Step by step

We want to plot the observed probability *p* with **Wilson score interval** error bars. We can’t use the Gaussian interval (some values are zero) and anyway, as other posts clarify, it is wrong to do so!

**First we gather the raw data.**We need to identify the raw frequencies,*F*, and the relationship between the different data series. Does it make sense to take proportions out of the total frequency (*N*)? What should the baseline be for any change?- If we use the total number of cases of
*think*,*N*, as a meaningful baseline, we can**obtain a set of semasiological probabilities**for each frequency,*p*=*F*/*N*. - Next we
**calculate basic Wilson score interval terms**. This is the most complicated step and can be broken down into two components for simple calculation.*Wilson centre adjusted probability*

*p’*= [*p*+*z*²/2*N*] / [1 +*z*²/*N*].*Wilson adjusted standard deviation*

*s’*= √*p*(1 –*p*)/*N*+*z*²/4*N²*/ [1 +*z*²/*N*].

Note that we could simplify each expression further and pre-calculate the

*Wilson denominator*[1 +*z*²/*N*] for every cell. - We can now calculate the
**upper and lower bound**of the interval in absolute terms:*Wilson score interval*

[*w*⁻,*w*⁺] = [*p’ – z.s’*,*p’ + z.s’*].

- Finally we can work out the upper and lower bounds
**relative to the probability***p*. Excel likes these both to be positive, so we have the following:*Wilson relative error bars*

[*Y*⁻,*Y*⁺] = [*p*–*w*⁻,*w*⁺ –*p*].

Instead of steps 3 and 4 above you can also use the **continuity corrected** formula for the Wilson score interval. The formula is equation (7) in (Wallis 2013) and is implemented in the 2 × 2 spreadsheet. It is also implemented in the spreadsheet for this example.

- For an efficient calculation the continuity-corrected Wilson standard deviation can be rewritten as

*s’*= (√*a*±*b*+ 1)/*d*,

where*a*=*z*² – 1/*N*+ 4*Np*(1 –*p*),*b*= 2–4*p*and*d*= 2[*N+z*²]. The sign of*b*is negative for the lower bound and positive for the upper bound, and the interval is limited to [-1, +1].

The continuity-corrected interval is slightly more conservative and corresponds to Yates’ 2 × 1 χ² test. For most plotting purposes, however, the standard Wilson interval is usually perfectly adequate.

### See also

- Excel spreadsheet
- Change and certainty: plotting confidence intervals (2)
- Reciprocating the Wilson interval
- Binomial confidence intervals and contingency tests
- Binomial → Normal → Wilson

### References

Levin, M. 2013. The progressive in modern American English. In Aarts, B., J. Close, G. Leech and S.A. Wallis (eds). *The Verb Phrase in English: Investigating recent language change with corpora*. Cambridge: CUP. » Table of contents and ordering info

Wallis, S.A. 2013. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. *Journal of Quantitative Linguistics ***20**:3, 178-208 **»** Post