The idea of plotting confidence intervals on data, which is discussed in a number of posts elsewhere on this blog, should be straightforward. Everything we observe is uncertain, but some things are more certain than others! Instead of marking an observation as a point, its better to express it as a ‘cloud’, an interval representing a range of probabilities.
But the standard method for calculating intervals that most people are taught is wrong.
The reasons why are dealt with in detail in (Wallis 2013). In preparing this paper for publication, however, I came up with a new demonstration, using real data, as to why this is the case.
If we find f items under investigation (what we elsewhere refer to as ‘Type A’ cases) out of n potential instances, the statistical model of inference assumes that it must be possible for f to be any number from 0 to n.
Probabilities, p = f / n, are expected to fall in the range [0, 1].
Note: this constraint is a mathematical one. All we are claiming is that the true proportion in the population could conceivably range from 0 to 1. This property is not limited to strict alternation with constant meaning (onomasiological, “envelope of variation” studies). In semasiological studies, where we evaluate alternative meanings of the same word, these tests can also be legitimate.
However, it is common in corpus linguistics to see evaluations carried out against a baseline containing terms that simply cannot plausibly be exchanged with the item under investigation. The most obvious example is statements of the following type: “linguistic Item x increases per million words between category 1 and 2”, with reference to a log-likelihood or χ² significance test to justify this claim.Rarely is this appropriate.
Some terminology: If Type A represents say, the use of modal shall, most words will not alternate with shall. For convenience, we will refer to cases that will alternate with Type A cases as Type B cases (e.g. modal will in certain contexts).
The remainder of cases (other words) are, for the purposes of our study, not evaluated. We will term these invariant cases Type C, because they cannot replace Type A or Type B.
In this post I will explain that not only does introducing such ‘Type C’ cases into an experimental design conflate opportunity and choice, but it also makes the statistical evaluation of variation more conservative. Not only may we mistake a change in opportunity as a change in the preference for the item, but we also weaken the power of statistical tests and tend to reject significant changes (in stats jargon, “Type II errors”).
In a previous post I discussed how to plot confidence intervals on observed probabilities. Using this method we can create graphs like the following. (Data is in the Excel spreadsheet we used previously: for this post I have added a second worksheet.)
The graph depicts both the observed probability of a particular form and the certainty that this observation is accurate. The ‘I’-shaped error bars depict the estimated range of the true value of the observation at a 95% confidence level (see Wallis 2013 for more details).
A note of caution: these probabilities are semasiological proportions (different uses of the same word) rather than onomasiological choices (see Choice vs. use).
In this post I discuss ways in which we can plot intervals on changes (differences) rather than single probabilities.