I have previously argued (Wallis 2014) that interaction evidence is the most fruitful type of corpus linguistics evidence for grammatical research (and doubtless for many other areas of linguistics).
Frequency evidence, which we can write as p(x), the probability of x occurring, concerns itself simply with the overall distribution of linguistic phenomenon x – such as whether informal written English has a higher proportion of interrogative clauses than formal written English. In order to calculate frequency evidence we must define x, i.e. decide how to identify interrogative clauses. We must also pick an appropriate baseline n for this evaluation, i.e. we need to decide whether to use words, clauses, or any other structure to identify locations where an interrogative clause may occur.
Interaction evidence is different. It is a statistical correlation between a decision that a writer or speaker makes at one part of a text, which we will label point A, and a decision at another part, point B. The idea is shown schematically in Figure 1. A and B are separate ‘decision points’ in a given relationship (e.g. lexical adjacency), which can be also considered as ‘variables’.
This class of evidence is used in a wide range of computational algorithms. These include collocation methods, part-of-speech taggers, and probabilistic parsers. Despite the promise of interaction evidence, the majority of corpus studies tend to consist of discussions of frequency differences and distributions.
In this paper I want to look at applications of interaction evidence which are made more-or-less at the same time by the same speaker/writer. In such circumstances we cannot be sure that just because B follows A in the text, the decision relating to B was made after the decision at A.
For example, in studying the premodification of noun phrases by attributive adjectives in English – which adjective is applied first in assembling an NP like the old tall green ship, for instance – we cannot be sure that adjectives are selected by the speaker in sentence order. It is also perfectly plausible that adjectives were chosen in an alternative or parallel order in the mind of the speaker, and then assembled in the final order during the language production process.
Of course, in cases where points A and B are separated substantively in time (as in many instances of structural self-priming) or where B is spoken in response to A by another speaker (structural priming of another’s language), there is unlikely to be any ambiguity about decision order. Moreover, if A licences B, then the order in unambiguous.
However, in circumstances where A and B are proximal, and where the order of decisions made by the speaker/writer cannot be presumed, we wish to consider whether there are mathematical or statistical methods for predicting the most likely order decisions were made.
Such a method would have considerable value in experimental design in cognitive corpus linguistics. For example, since Heads of NPs, VPs etc are conceived of as determining their complements, it may not be too much a stretch to argue that if this method works, we may have found a way of empirically evaluating this grammatical concept.
2.2 Directional statistics
However, the goodness of fit χ² statistic (Wallis 2013a, b) may be used in a directional way. It can be used to evaluate an increase in probability from a superset to a subset. In this case we compare the probability of the event occurring in the superset (here, the rate across the entire corpus for askance) with the same probability in a subset (in this example, the word following LOOK).
Comparing the first column in this table with the ‘total’ evaluates χ²(askance | LOOK), whether the presence of LOOK affects the chance of askance appearing. (In the spreadsheet this 2×1 test is computed in the second row, below the 2×2 tests.)
We test whether p(askance | LOOK) differs from a given probability p(askance), where all the uncertainty is in p(askance | LOOK). This obtains χ² = 18,848.65 and a probabilistically-weighted goodness of fit φp (Wallis 2012) of 0.00.
This comparison is illustrated visually in Figure 4, which uses a 95% Wilson score confidence interval (Wallis 2013a, b). Although the probability increases 610 times from the near-zero starting point, the intervals are relatively wide and the end probability is still low.
To obtain the reverse statistic (the goodness of fit χ²(LOOK | askance)), we simply swap rows and columns in the table. This obtains a χ² of 18,868.62 and a φp of 0.64.
In this case, the χ² score only seems slightly elevated from the reverse statistic. However, as discussed elsewhere (see also Wallis 2013a), χ² scores combine size of effect and weight of evidence, so they are not good measures for comparative purposes. The key information is found in the effect size measure, φp.
- A collocation example
2.1 Employing chi-square and phi
2.2 Directional statistics
2.3 Significantly directional?
- A grammatical example
3.1 Testing for difference under alternation
3.2 Comparing Newcombe-Wilson intervals for direction
- Mapping significance of association and direction
- Concluding remarks
Wallis, S.A. 2017. Detecting direction in interaction evidence. London: Survey of English Usage. » Paper (PDF)
Wallis, S.A. 2011. Comparing χ² tests for separability. London: Survey of English Usage, UCL. » post
Wallis, S.A. 2012. Goodness of fit measures for discrete categorical data. London: Survey of English Usage, UCL. » post
Wallis, S.A. 2013a. z-squared: the origin and application of χ². Journal of Quantitative Linguistics 20:4, 350-378. » post
Wallis, S.A. 2013b. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics 20:3, 178-208. » post
Wallis, S.A. 2014. What might a corpus of parsed spoken data tell us about language? In L. Veselovská and M. Janebová (eds.) Complex Visibles Out There. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure. Olomouc: Palacký University, 2014. pp 641-662. » post
Wallis, S.A. forthcoming. That vexed problem of choice. London: Survey of English Usage, UCL. » post