Genre differences and experimental observations

Spoken categories, modal verbs and change over time

In a recently-published paper, Bowie, Wallis and Aarts (2013) demonstrate that observations regarding changes in the frequency of modal verbs over time are highly sensitive to differences in genre (‘register’ or ‘text category’). Our paper, although based on spoken British English, may shed some light on a recent dispute between Leech (2011) and Millar (2009) regarding how linguists should interpret corpus observations regarding changes in the modal verb system in written US English.

The following table summarises statistically significant percentage decreases and increases of individual modal verbs as a proportion of the number of tensed verb phrases (VPs that could conceivably take a modal verb), within different spoken genre subcategories of the Diachronic Corpus of Present-day Spoken English (DCPSE). The statistical test used examines differences in observed probabilities between samples, i.e. a Newcombe-Wilson test.

For our purposes the cited percentages do not matter, but the direction of travel (indicated by coloured cells) does.

can may could might shall will should would must All
formal f2f ns ns ns ns ns ns -60% ns -75%
informal f2f 27% -42% ns 47% -32% ns ns ns -53% ns
telephone -37% ns -44% ns -56% -30% ns -44% ns -35%
b. discussions -41% -59% ns ns -83% ns ns ns -54% -20%
b. interviews ns -61% ns -59% ns -41% -55% -32% -57% -35%
commentary ns ns ns ns -93% 58% ns ns -64% ns
parliament ns ns ns ns ns -39% ns -30% ns -20%
legal x-exam 304% ns ns ns ns ns 1,265% 254% ns 157%
spontaneous ns ns ns ns ns ns ns ns ns ns
prepared sp. ns -63% ns ns ns 327% ns -32% -48% ns
All genres ns -40% -11% ns -48% 13% -14% -7% -54% -6%

Significant changes (α<0.05) in the proportion of individual core modals out of tensed verb phrases from the 1960s (LLC) to 1990s (ICE-GB) components in DCPSE, adapted from Bowie et al. 2013.

This study concerns modal verbs within text categories. Against a general baseline (words, verb phrases or tensed verb phrases), the total number of modals decrease in use over the course of the period covered by the data (at least, noting the caveat, for spoken English data sampled comparably). Above, we employ tensed verb phrases as the most meaningful baseline out of the three. See That vexed problem of choice.

  • Note that if we take all genres together (bottom row in the table), except for will, every significant change is a decline in use, but in the (large) category of informal face-to-face conversation (second row from top), can and might are both significantly increasing.
  • Legal cross-examination is a predictable outlier, but broadcast interviews and discussions appear to generate very different results.

Clustering genres by individual modal diachronic change

The paper goes on to evaluate individual modal changes expressed as a proportion of the set of core modals (i.e. given the speaker employs a modal verb, which do they use?). This is still a survey method, rather than a strict alternation study, as speakers do not choose between all modal verbs freely in all situations (consider: positive may does not express obligation and alternate with must, but may not might conceivably alternate with must not). Importantly for the next step, employing the set of modals as a baseline factors out the overall decline in modals.

This approach allows us to use an algorithm to cluster text categories together into a tree (or ‘dendrogram’) according to how similar their patterns were. The clustering method, repertory grid analysis (RGA, Kelly 1955), is fundamentally a descriptive statistical method (see Inferential statistics and other animals for more on this distinction).

In this case we perform the following steps:

  1. For each cell, convert observed significant increases/decreases over time into a simple five-point ordinal scale:
    • 1 = significant decrease (α<0.05)
    • 2 = quasi-‘significant’ decrease at α<0.1
    • 3 = non-significant
    • 4 = quasi-‘significant’ increase at α<0.1
    • 5 = significant increase (α<0.05)
  2. Compute the difference between two rows by summing the difference in scores for each pair of cells in the row.
  3. Employ this computation in a clustering algorithm that fits the closest pair (that with the smallest difference score) together first.
  4. The score for a merged row is computed from merged cells.

[This is not the only way we might score differences between rows. An alternative method replacing steps (1) and (2) above might compare each cell pair for statistical separability (testing whether two observed differences significantly differ), and then score each comparison by counting up the number of pairs that were significantly different.]

The results of the algorithm above are shown in the figure below.

RGA analysis of text genres clustered by similarity of modal diachronic change in DCPSE.

Repertory grid analysis (RGA) dendrogram of text genres, clustered by similarity of within-modal diachronic change in DCPSE.

The dendrogram shows that broadcast discussions appear to be closest to the category of prepared speech in their behaviour over time, whereas spontaneous commentary and formal face-to-face conversations are also similar. Informal face-to-face conversations appear as an outlier in this analysis.

The paper also carries out the corresponding step of clustering modal verbs by genres, finding, for instance that the set {shall, must, may, can}  are distinct from the remainder of the core modals in their patterns of change. (Note that this tells us how similar they are in patterns of changing use, not how similar they are semantically!)

Implications of genre sensitivity

The issue of genre-sensitivity of observed change has broader lessons for linguists, even if we accept that not all phenomena may be as sensitive to genre variation as modal verbs. However the default assumption must be that genre effects are present in observations, and this null hypothesis should be considered.

The observation that changes may be sensitive to genre is not itself particularly new. Projects such as the International Corpus of English (ICE, Greenbaum 1996) required compilers to follow a sampling framework consisting of standardised genre categories. This led to practical difficulties, including the problem of obtaining telephone calls in countries where a tiny proportion of the population had access to the telephone, or where recording telephone calls was simply illegal. 

In diachronic corpus linguistics, as in the example illustrated above, comparisons need to be made by typologically-balanced corpora, i.e. corpora where the same genre categories (including sampling principles and quantity of data) are found in each time period. DCPSE is typologically balanced between ‘early’ and ‘late’ periods, although the dates on which the early material is sampled is spread over many years. The same principle applies to synchronic comparisons.

Q: What should researchers do if genre categories vary in size over the independent variable (in this case time)?
A: Either carry out genre-specific comparisons, or downsample each oversized category.

In summary, genre sensitivity has implications in comparing results obtained from different corpora, particularly if they are not sampled according to the same genre framework.

See also


Bowie, J., Wallis, S.A., and Aarts, B. 2013.  Contemporary change in modal usage in spoken British English: mapping the impact of “genre”. In Marín-Arrese, J.I., Carretero, M., Arús H.J. and van der Auwera, J. (ed.) English Modality, Berlin: De Gruyter, 57–94. » Publisher’s website

Kelly, G. 1955. The psychology of personal constructs. New York: Norton.

Leech, G. 2011. The modals ARE declining: reply to Neil Millar’s ‘Modal verbs in TIME: frequency changes 1923–2006’. International Journal of Corpus Linguistics 16:4, 547–64.

Millar, N. 2009. Modal verbs in TIME: frequency changes 1923–2006. International Journal of Corpus Linguistics 14:2, 191–220.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.