Genre differences and experimental observations

Spoken categories, modal verbs and change over time

In a recently-published paper, Bowie, Wallis and Aarts (2013) demonstrate that observations regarding changes in the frequency of modal verbs over time are highly sensitive to differences in genre (‘register’ or ‘text category’). Our paper, although based on spoken British English, may shed some light on a recent dispute between Leech (2011) and Millar (2009) regarding how linguists should interpret corpus observations regarding changes in the modal verb system in written US English.

The following table summarises statistically significant percentage decreases and increases of individual modal verbs as a proportion of the number of tensed verb phrases (VPs that could conceivably take a modal verb), within different spoken genre subcategories of the Diachronic Corpus of Present-day Spoken English (DCPSE). The statistical test used examines differences in observed probabilities between samples, i.e. a Newcombe-Wilson test.

For our purposes the cited percentages do not matter, but the direction of travel (indicated by coloured cells) does.

can may could might shall will should would must All
formal f2f ns ns ns ns ns ns -60% ns -75%
informal f2f 27% -42% ns 47% -32% ns ns ns -53% ns
telephone -37% ns -44% ns -56% -30% ns -44% ns -35%
b. discussions -41% -59% ns ns -83% ns ns ns -54% -20%
b. interviews ns -61% ns -59% ns -41% -55% -32% -57% -35%
commentary ns ns ns ns -93% 58% ns ns -64% ns
parliament ns ns ns ns ns -39% ns -30% ns -20%
legal x-exam 304% ns ns ns ns ns 1,265% 254% ns 157%
spontaneous ns ns ns ns ns ns ns ns ns ns
prepared sp. ns -63% ns ns ns 327% ns -32% -48% ns
All genres ns -40% -11% ns -48% 13% -14% -7% -54% -6%

Significant changes (α<0.05) in the proportion of individual core modals out of tensed verb phrases from the 1960s (LLC) to 1990s (ICE-GB) components in DCPSE, adapted from Bowie et al. 2013.

This study concerns modal verbs within text categories. Against a general baseline (words, verb phrases or tensed verb phrases), the total number of modals decrease in use over the course of the period covered by the data (at least, noting the caveat, for spoken English data sampled comparably). Above, we employ tensed verb phrases as the most meaningful baseline out of the three. See That vexed problem of choice.

  • Note that if we take all genres together (bottom row in the table), except for will, every significant change is a decline in use, but in the (large) category of informal face-to-face conversation (second row from top), can and might are both significantly increasing.
  • Legal cross-examination is a predictable outlier, but broadcast interviews and discussions appear to generate very different results. Continue reading

Comparing frequencies within a discrete distribution

This page explains how to compare observed frequencies f₁ and f₂ from the same distributionF = {f₁, f₂,…}.
To compare observed frequencies f₁ and f₂ from different distributions, i.e. where F₁ = {f₁,…} and F₂ = {f₂,…}, you need to use a chi-square or Newcombe-Wilson test.


In a recent study, my colleague Jill Bowie obtained a discrete frequency distribution by manually classifying cases in a small sample drawn from a large corpus.

Jill converted this distribution into a row of probabilities and calculated Wilson score intervals on each observation, to express the uncertainty associated with a small sample. She had one question, however:

How do we know whether the proportion of one quantity is significantly greater than another?

We might use a Newcombe-Wilson test (see Wallis 2013a), but this test assumes that we want to compare samples from independent sources. Jill’s data are drawn from the same sample, and all probabilities must sum to 1. Instead, the optimum test is a dependent-sample test.


A discrete distribution looks something like this: F = {108, 65, 6, 2}. This is the frequency data for the middle column (circled) in the following chart.

This may be converted into a probability distribution P, representing the proportion of examples in each category, by simply dividing by the total: P = {0.60, 0.36, 0.03, 0.01}, which sums to 1.

We can plot these probabilities, with Wilson score intervals, as shown below.


An example graph plot showing the changing proportions of meanings of the verb think over time in the US TIME Magazine Corpus, with Wilson score intervals, after Levin (2013). In this post we discuss the 1960s data (circled). The sum of each column probability is 1. Many thanks to Magnus for the data!

So how do we know if one proportion is significantly greater than another?

  • When comparing values diachronically (horizontally), data is drawn from independent samples. We may use the Newcombe-Wilson test, and employ the handy visual rule that if intervals do not overlap they must be significantly different.
  • However, probabilities drawn from the same sample (vertically) sum to 1 — which is not the case for independent samples! There are k−1 degrees of freedom, where k is the number of classes. It turns out that the relevant significance test we need to use is an extremely basic test, but it is rarely discussed in the literature.

Continue reading