I have previously argued that interaction evidence is the most fruitful type of corpus linguistics evidence for grammatical research (and doubtless many other areas of linguistics).
Frequency evidence, which we can write as p(x), the probability of x occurring, concerns itself simply with the overall distribution of linguistic phenomenon x – such as whether informal written English has a higher proportion of interrogative clauses than formal written English. In order to calculate frequency evidence we must define x, i.e. decide how to identify interrogative clauses. We must also pick an appropriate baseline n for this evaluation, i.e. we need to decide whether to use words, clauses, or any other structure to identify locations where an interrogative clause may occur.
Interaction evidence is different. It is evidence of whether a decision that a writer or speaker makes at one part of a text, which we will label point a, interacts with a decision at another part, point b. This class of evidence is used in a wide range of computational algorithms. These include collocation methods, part-of-speech taggers, and probabilistic parsers. Despite the promise of interaction evidence, the majority of corpus studies tend to consist of discussions of frequency differences and distributions.
In this blog post I want to look at applications of interaction evidence which are made more-or-less at the same time by the same speaker/writer. In such circumstances we cannot be sure that just because b follows a in the text, the decision relating to b was made after the decision at a. Continue reading