Detecting direction in interaction evidence

IntroductionPaper (PDF)

I have previously argued (Wallis 2014) that interaction evidence is the most fruitful type of corpus linguistics evidence for grammatical research (and doubtless for many other areas of linguistics).

Frequency evidence, which we can write as p(x), the probability of x occurring, concerns itself simply with the overall distribution of linguistic phenomenon x – such as whether informal written English has a higher proportion of interrogative clauses than formal written English. In order to calculate frequency evidence we must define x, i.e. decide how to identify interrogative clauses. We must also pick an appropriate baseline n for this evaluation, i.e. we need to decide whether to use words, clauses, or any other structure to identify locations where an interrogative clause may occur.

Interaction evidence is different. It is a statistical correlation between a decision that a writer or speaker makes at one part of a text, which we will label point A, and a decision at another part, point B. The idea is shown schematically in Figure 1. A and B are separate ‘decision points’ in a given relationship (e.g. lexical adjacency), which can be also considered as ‘variables’.

Figure 1: Associative inference from lexico-grammatical choice variable A to variable B (sketch).

Figure 1: Associative inference from lexico-grammatical choice variable A to variable B (sketch).

This class of evidence is used in a wide range of computational algorithms. These include collocation methods, part-of-speech taggers, and probabilistic parsers. Despite the promise of interaction evidence, the majority of corpus studies tend to consist of discussions of frequency differences and distributions.

In this paper I want to look at applications of interaction evidence which are made more-or-less at the same time by the same speaker/writer. In such circumstances we cannot be sure that just because B follows A in the text, the decision relating to B was made after the decision at A.

For example, in studying the premodification of noun phrases by attributive adjectives in English – which adjective is applied first in assembling an NP like the old tall green ship, for instance – we cannot be sure that adjectives are selected by the speaker in sentence order. It is also perfectly plausible that adjectives were chosen in an alternative or parallel order in the mind of the speaker, and then assembled in the final order during the language production process.

Of course, in cases where points A and B are separated substantively in time (as in many instances of structural self-priming) or where B is spoken in response to A by another speaker (structural priming of another’s language), there is unlikely to be any ambiguity about decision order. Moreover, if A licences B, then the order in unambiguous.

However, in circumstances where A and B are proximal, and where the order of decisions made by the speaker/writer cannot be presumed, we wish to consider whether there are mathematical or statistical methods for predicting the most likely order decisions were made.

Such a method would have considerable value in experimental design in cognitive corpus linguistics. For example, since Heads of NPs, VPs etc are conceived of as determining their complements, it may not be too much a stretch to argue that if this method works, we may have found a way of empirically evaluating this grammatical concept.


2.2 Directional statistics

Figure 3: Directional inference from lexico-grammatical choice variable A to variable B (sketch).

Figure 3: Directional inference from lexico-grammatical choice variable A to variable B (sketch).

However, the goodness of fit χ² statistic (Wallis 2013a, b) may be used in a directional way. It can be used to evaluate an increase in probability from a superset to a subset. In this case we compare the probability of the event occurring in the superset (here, the rate across the entire corpus for askance) with the same probability in a subset (in this example, the word following LOOK).

Comparing the first column in this table with the ‘total’ evaluates χ²(askance | LOOK), whether the presence of LOOK affects the chance of askance appearing. (In the spreadsheet this 2×1 test is computed in the second row, below the 2×2 tests.)

We test whether p(askance | LOOK) differs from a given probability p(askance), where all the uncertainty is in p(askance | LOOK). This obtains χ² = 18,848.65 and a probabilistically-weighted goodness of fit φp (Wallis 2012) of 0.00.

observed LOOK ¬LOOK total
p(askance) 0.00029281 0.00000048

Table 2b: supplementary column probabilties for the contingency table for LOOK and askance.

This comparison is illustrated visually in Figure 4, which uses a 95% Wilson score confidence interval (Wallis 2013a, b). Although the probability increases 610 times from the near-zero starting point, the intervals are relatively wide and the end probability is still low.

Increase in probability of p(askance) when the previous word is LOOK, obtained from the 2x2 chi-square spreadsheet.

Figure 4: Increase in probability of p(askance) when the previous word is LOOK, obtained from the 2×2 chi-square spreadsheet.

To obtain the reverse statistic (the goodness of fit χ²(LOOK | askance)), we simply swap rows and columns in the table. This obtains a χ² of 18,868.62 and a φp of 0.64.

data independent variable
observed askance ¬askance total
LOOK 31 105,840 105,871
¬LOOK 17 99,894,112 99,894,129
total 48 99,999,952 100,000,000
p(LOOK) 0.64583333 0.00105871

Table 2c: Transposed contingency table: now the ‘independent variable’ is askance and the ‘dependent variable’ LOOK.

In this case, the χ² score only seems slightly elevated from the reverse statistic. However, as discussed elsewhere (see also Wallis 2013a), χ² scores combine size of effect and weight of evidence, so they are not good measures for comparative purposes. The key information is found in the effect size measure, φp.

Increase in probability of p(LOOK) when the following word is askance, also obtained from the 2x2 chi-square spreadsheet.

Figure 5: Increase in probability of p(LOOK) when the following word is askance, also obtained from the 2×2 chi-square spreadsheet.


  1. Introduction
  2. A collocation example
    2.1 Employing chi-square and phi
    2.2 Directional statistics
    2.3 Significantly directional?
  3. A grammatical example
    3.1 Testing for difference under alternation
    3.2 Comparing Newcombe-Wilson intervals for direction
  4. Mapping significance of association and direction
  5. Concluding remarks
  6. References


Wallis, S.A. 2017. Detecting direction in interaction evidence. London: Survey of English Usage. » Paper (PDF)

See also


Wallis, S.A. 2011. Comparing χ² tests for separability. London: Survey of English Usage, UCL. » post

Wallis, S.A. 2012. Goodness of fit measures for discrete categorical data. London: Survey of English Usage, UCL. » post

Wallis, S.A. 2013a. z-squared: the origin and application of χ². Journal of Quantitative Linguistics 20:4, 350-378. » post

Wallis, S.A. 2013b. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics 20:3, 178-208. » post

Wallis, S.A. 2014. What might a corpus of parsed spoken data tell us about language? In L. Veselovská and M. Janebová (eds.) Complex Visibles Out There. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure. Olomouc: Palacký University, 2014. pp 641-662. » post

Wallis, S.A. forthcoming. That vexed problem of choice. London: Survey of English Usage, UCL. » post


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s