Abstract Full Paper (PDF)
Numerous competing grammatical frameworks exist on paper, as algorithms and embodied in parsed corpora. However, not only is there little agreement about grammars among linguists, but there is no agreed methodology for demonstrating the benefits of one grammar over another. Consequently the status of parsed corpora or ‘treebanks’ is suspect.
The most common approach to empirically comparing frameworks is based on the reliable retrieval of individual linguistic events from an annotated corpus. However this method risks circularity, permits redundant terms to be added as a ‘solution’ and fails to reflect the broader structural decisions embodied in the grammar. In this paper we introduce a new methodology based on the ability of a grammar to reliably capture patterns of linguistic interaction along grammatical axes. Retrieving such patterns of interaction does not rely on atomic retrieval alone, does not risk redundancy and is no more circular than a conventional scientific reliance on auxiliary assumptions. It is also a valid experimental perspective in its own right.
We demonstrate our approach with a series of natural experiments. We find an interaction captured by a phrase structure analysis between attributive adjective phrases under a noun phrase with a noun head, such that the probability of adding successive adjective phrases falls. We note that a similar interaction (between adjectives preceding a noun) can also be found with a simple part-of-speech analysis alone. On the other hand, preverbal adverb phrases do not exhibit this interaction, a result anticipated in the literature, confirming our method.
Turning to cases of embedded postmodifying clauses, we find a similar fall in the additive probability of both successive clauses modifying the same NP and embedding clauses where the NP head is the most recent one. Sequential postmodification of the same head reveals a fall and then a rise in this additive probability. Reviewing cases, we argue that this result can only be explained as a natural phenomenon acting on language production which is expressed by the distribution of cases on an embedding axis, and that this is in fact empirical evidence for a grammatical structure embodying a series of speaker choices.
We conclude with a discussion of the implications of this methodology for a series of applications, including optimising and evaluating grammars, modelling case interaction, contrasting the grammar of multiple languages and language periods, and investigating the impact of psycholinguistic constraints on language production.
Parsed corpora of English, where every sentence is fully grammatically analysed in the form of a tree, have been available to linguists for nearly two decades, from the publication of the University of Pennsylvania Treebank (Marcus et al. 1993) onwards. Such corpora have a number of applications including training automatic parsers, acting as a test set for text mining, or as a source for exemplification and teaching purposes.
A range of grammatical frameworks have been exhaustively applied to corpora. Penn Treebank notation (Marcus et al. 1993) is a skeleton phrase structure grammar that has been applied to numerous corpora, including the University of Pennsylvania Treebank and the Spanish Syntactically Annotated Corpus (Moreno et al. 2003). Other phrase structure grammars include the Quirk-based TOSCA/ICE, used for the British Component of the International Corpus of English (ICE-GB, Nelson, Wallis and Aarts 2002) and the Diachronic Corpus of Present-day Spoken English. Dependency grammars include the Helsinki Constraint Grammar (Karlsson et al. 1995), which has been applied to (among others) English, German and numerous Scandinavian lang-uage corpora. Other dependency corpora include the Prague Dependency Treebank (Böhmová et al. 2003) and the Turkish Treebank (Oflazer et al. 2003).
1.1 The problem of grammatical epistemology
Naturally this brief list understates the range of frameworks that have been applied to corpora, and concentrates on those applied to the largest amount of data.
The status of knowledge embedded in a corpus grammar raises some problematic questions. Given the range of frameworks adopted by linguists, how should annotators choose between them? The choice of grammar risks a circular justification – one can train a parser based on one framework on a corpus analysed by the same framework (Fang 1996), but this does not tell us anything about whether the framework is correct. To put it another way, what general extra-grammatical principles may be identified that might be informed by corpus data? In this paper we argue that parsed corpora can help us find evidence of psycholinguistic processing constraints in language production that might allow us to re-examine this question from a perspective of cognitive plausibility.
Cited motivations for annotator’s choice of scheme range from commensurability with a traditional grammar such as Quirk et al. (1985) (Greenbaum and Ni 1996), reliability of automatic processing against a minimum framework, and maximising the opportunities for information extraction (Marcus et al. 1994).
A related question concerns commensurability. If we choose one scheme out of many, are results obtained from our corpus commensurable with results from data analysed by a different scheme, or have we become lead up the garden path by a particular framework? Indeed a standard criticism of the treebank linguistics community is that since theorists’ knowledge of grammar is contested and imperfect, corpus annotation is likely to be wrong. John Sinclair (1987) argued that linguistic insight should be driven by word patterns rather than subsumed under a given grammar. Many linguists do not use parsed corpora. Part of the reason may be misgivings about the annotations of others.
Nelson et al. (2002) argue that this problem is primarily one of research stance. The research paradigm should consider the limitations of corpus annotation and the potential for results to be an artefact of the chosen framework. They propose a cyclic exploratory methodology where reference back to original sentences, and ‘playing devil’s advocate’, is constantly encouraged.
The gulf between theoretical linguists such as Chomsky, and lexical corpus linguists like Sinclair, is wide. This does not mean, however that this gulf cannot be bridged, as Aarts (2001) points out. Corpus linguists need not eschew theory and theoreticians should consider how their frameworks may be evidenced.
A parsed corpus is a source of three principal types of evidence. First, applying an algorithm to a broad range of text samples provides frequency evidence of known phenomena found in the parser rulebase. The manual correction and completion of the parser output both improves this frequency evidence and supplements it with a second type of evidence: enhanced coverage with previously unknown rules.
Third, a parsed corpus is a rich source of evidence of lexical and grammatical interaction. As speakers form utterances they make a series of conscious and unconscious decisions: to use one word, phrase, etc., rather than an alternative. These decisions are often not independent from each other (i.e., they interact). In this paper we will consider whether evidence of one type of interaction is relevant to the psycholinguistic evaluation of grammatical frameworks.
1.2 Deciding between frameworks
In corpus linguistics the first evaluation (and evolution) of a grammar takes place during annotation. The task of ensuring that every utterance is correctly and consistently described by a grammar presents a series of methodological challenges (Wallis and Nelson 1997; Wallis 2003). The descriptive task typically leads to minor modifications of the grammar. However, such on-the-fly adaptation risks being ad hoc, local, and unprincipled. This paper concerns a second process: the review of completed parsed corpora. In order to do this we must first agree evaluative criteria.
By far the most common criterion for arguing that one representation is ‘better’ than another is decidability, or the retrievability of linguistic events (Wallis 2008). If a concept – the subject of a clause, a particular type of direct object, etc. – can be reliably retrieved from one corpus representation but cannot be as reliably retrieved with another, then the first representation can be said to be ‘better’ than the second. This is another way of saying that the event can be distinguished from other similar events.
For example, the scope of attributive adjectives over co-ordinated noun heads varies. The following are not grammatically distinguished in the ICE-GB corpus (see Section 2) and therefore one cannot be retrieved without the others.
fried aubergines and yoghurt [S1A-063 #19] (only aubergines are fried)
late teens and twenties [S1A-013 #107] (ambiguous)
recent article and correspondence [S1B-060 #42] (both are recent)
Retrievability of events is a useful criterion, but it has three problems. These are circularity (the value of a concept in question, such as attribute scope, must be assumed), redundancy (a representation can ‘improve’ by simply adding distinctions like those above to the framework), and atomisation (single events within a grammatical structure are evaluated, rather than the structure itself).
1.3 Interaction along grammatical axes of addition
In this paper we propose and explore a complementary ‘structural’ approach to empirically evaluating grammar by examining patterns of interaction between concepts along grammatical axes. We believe that our approach has cognitive plausibility, and that the parsed corpus may reveal some novel non-obvious consequences of language production. The method builds on the ‘linguistic event retrieval’ principle above by exploring the impact of one linguistic event on another event of the same type.
We will study patterns of repeated decisions of the following form:
base → +term1 → +term2 … +termn,
where arrows indicate separate decisions to add a further term and plus signs indicate the application of an operator that adds terms in a specific way (i.e., governed by a particular structural relationship) along a particular grammatical axis. Grammatical axes must be defined within the framework of a given grammar and operators must, in principle at least, be repeatable.
In summary, our proposal is to analyse a fully parsed corpus of natural language of speech and writing to evaluate evidence that related series of speaker (or author) choices in the production of language are partially mutually constrained rather than independent, and investigate how these effects may be evidenced along multiple grammatical axes.
Our position is consistent with a view that grammar partly encapsulates the ‘trace’ of speaker choices. It is not necessary to make claims about a particular mechanism by which speakers make these choices or, indeed, as we shall see below, the order in which they do so.
Our proposed methodology has psycholinguistic implications. Anderson (1983) refers to the actual results of a psychological process as the ‘signature’ of the phenomenon, and points out that computer simulations may replicate that signature without exposing the underlying process of cognition. A computer system for generating ‘natural language’ does not necessarily provide understanding regarding how humans produce language, nor parsers, how we interpret sentences. At best they may help identify parameters of the human analogue. Our proposition is that natural experiments on the results of parsing may help identify parameters of corpus contributors’ processes of language production.
2. A worked example: Grammatical interaction between prenominal AJPs
The definition of ‘grammatical axes’ provided above is rather abstract. Let us consider a simple example. English noun phrases can (in principle) take any number of adjectives in an attributive position before the noun: the old ship, the old blue ship, etc. (Huddleston and Pullum 2002: 57). By way of exemplifying our method, we will investigate the general proposition that the introduction of one adjective constrains the addition of another.
We wish to refute a null hypothesis of the independence of each decision to add an attributive adjective, i.e., that the chance of preposing a second adjective is the same as the chance of preposing the first, and so on. The assumption that adjectives are not independent, but instead semantically restrict a head noun (and thus each other), is fundamental to any investigation of, e.g., constraints on adjective word order.
Note that identifying that an interaction is taking place between the decisions to insert two adjectives A and B does not in itself demonstrate the order of decision making by a speaker. The decision to insert A could be made prior to the decision to insert B, vice versa, the decisions may co-occur, or (in the case of writing) lead to later decisions to revise previously-inserted adjectives.
The methodology we describe is, as we shall see, a valid investigatory approach in its own right, and could be a precursor to further individual choice experiments concentrating on, e.g., whether particular classes of adjective in particular positions limit the introduction of other adjectives. However in this paper we will concentrate on what the methodology can tell us about the grammar in the corpus per se.
2.1 AJPs with noun heads
Our first task is to collect data. In a part of speech (POS)-tagged corpus we can obtain frequencies of cases of single, double, etc., adjectives followed by a noun, which we do below. In a parsed corpus we can be more precise, limiting our query by the noun phrase (NP) and by counting attributive adjective phrases rather than adjectives alone. This permits us to count cases such as the old [pale blue] ship correctly (cf. the retrievability criterion outlined above).
The British Component of the International Corpus of English (ICE-GB, Nelson, Wallis and Aarts 2002) is a fully-parsed million-word corpus of 1990s British English, 40% of which is written and 60% spoken. In this paper our results come from ICE-GB Release 2. ICE-GB is supplied with an exploration tool, ICECUP, which has a grammatical query system that uses idealised grammatical patterns termed Fuzzy Tree Fragments (FTFs, Wallis and Nelson 2000) to search the trees.
We construct a series of FTFs of the form above, i.e., a noun phrase containing a noun head and x adjective phrases before the head. This FTF will match cases where at least x AJPs precede the noun (the FTF does not exclude cases with terms prior to the first AJP). This obtains the raw frequency row F in the table below. Using this information alone we can compute the additive probability of adding the x-th adjective phrase to the NP, p(x), as
p(x) ≡ F(x) / F(x-1).
So if there are in total 193,035 NPs in the corpus with a noun head, and 37,305 have at least one attributive adjective phrase, then the probability of adding the first adjective phrase is 37,305/193,035 = 0.1932. We can then compute p(x) for all values of x > 0.
|x adjective phrases||0||1||2||3||4|
|‘at least x’ F||193,135||37,305||2,944||155||7|
|upper bound w⁺||0.0817||0.0613||0.0903|
|lower bound w⁻||0.0762||0.0451||0.0220|
The table reveals that the additive probability, p, falls numerically as x increases. We plot probability p with Wilson score confidence intervals in the figure. Appendix 1 in the paper explains the Wilson interval and the overall approach to analysing sequential decisions step-by-step.
You can ‘read’ this figure from left-to-right: if a point is outside the upper or lower interval of the next point, the probability has significantly changed with increasing x at that subsequent point.
Adding a second adjective phrase to an NP occurs in 1 in 12 (0.0789) cases where a first adjective phrase has been introduced. This contrasts with the introduction of an initial attributive AJP, which occurs in approximately 1 in 5 cases of NPs (0.1932). The difference is comfortably statistically significant (0.0817 < 0.1932).
Adding a third adjective phrase to an NP occurs in 155 out of 2,944 cases where two AJPs had been introduced, i.e., 1 in 19 (0.0526). This fall is also statistically significant (0.0613 < 0.0789). In the final case, adding a fourth AJP to an NP occurs 7 times out of 155 (1 in 22). This has an upper interval of 0.0903, which is greater than 0.0526, and therefore does not represent a significant fall.
Note that the results might also allow the conclusion that the fall in probability over multiple steps is significant, e.g., that the probability of adding a fourth AJP is greater than that of adding the first (0.0903 < 0.1932). However, in this paper we will restrict ourselves to conclusions concerning ‘strong’ (i.e., successive and unbroken) trends.
The data demonstrates that decisions to add successive attributive adjective phrases in noun phrases in ICE-GB are not independent from previous decisions to add AJPs. Indeed, our results here indicate a negative feedback loop, such that the presence of each AJP dissuades the speaker from adding another. We reject the null hypothesis that the relative probability is constant for the addition of the second and third successive attributive AJPs.
- See also this Excel spreadsheet.
There are a number of potential explanations for these results, including
- adjective ordering rules: adjectives of size before colour, etc.
- logical semantic coherence: it is possible to say the tall green ship but the tall short ship is highly implausible, i.e. adjectives logically exclude antonyms.
- communicative economy: once a speaker has said the tall green ship they tend to say the ship or it in referring to the same object.
- psycholinguistics: cognitive constraints, such as processing and memory.
Without examining the dataset more closely it would be difficult to distinguish between these potential sources, and indeed multiple sources of interaction may apply at once. Nonetheless, as we shall show, the interaction between adjective phrases modifying the same noun head is a substantial effect which can be identified without recourse to parsing.
In the case of adjective ordering rules, comparative studies with other languages where no strong ordering rules apply might allow us to eliminate this as an explanation. Interestingly we find evidence of a different feedback effect in the case of multiple postmodification of the same head (see Section 4).
It is possible to ‘fit’ the curve to a power function of the form f = m.xk. Allowing for increasing variance as F falls, the data obtains the function f = 0.1931x -1.2793 with a correlation coefficient R² of 0.9996. This model suggests that probability falls (k is negative) according to a power law. We discuss the implications of modelling with a power law in the conclusion.
This result upholds general linguistic expectations that the use of successive multiple adjectives are avoided due to linguistic or contextual constraints. It does not inform us what these constraints might be, although we may hypothesise about these. Nor does the evidence inform us to the order of insertion. These questions are not our principal interest in this paper. Rather, we have demonstrated evidence of a general trend along the axis of the grammatical analysis of NP constituents.
1.1 The problem of grammatical epistemology
1.2 Deciding between frameworks
1.3 Interaction along grammatical axes of addition
- A worked example: Grammatical interaction between prenominal AJPs
2.1 AJPs with noun heads
2.2 AJPs with proper and common noun heads
2.3 Grammatical interaction between prenominal adjectives
- Grammatical interaction between preverbal adverb phrases
- Grammatical interaction between postmodifying clauses
4.1 Embedded vs. sequential postmodification
4.2 Data gathering
4.3 Results and discussion
4.4 What might these results imply?
5.1 Implications for corpus linguistics
5.2 Towards the evaluation of grammar
5.3 Have we arrived at a non-circular ‘proof of grammar’?
5.4 Further applications
Wallis, S.A. 2012. Capturing patterns of linguistic interaction in a parsed corpus: an insight into the empirical evaluation of grammar? London: Survey of English Usage www.ucl.ac.uk/english-usage/statspapers/analysing-grammatical-interaction.pdf
- Excel spreadsheets
- Inferential statistics and other animals
- Binomial confidence intervals and contingency tests
- Comparing χ² tests for separability
- A statistics crib sheet
Aarts, B. 2001. Corpus linguistics, Chomsky and Fuzzy Tree Fragments. In: Mair, C. and Hundt, M. (eds.) 2001. Corpus linguistics and linguistic theory. Amsterdam: Rodopi. 5-13.
Abeillé, A. (ed.) 2003. Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.
Anderson, J.R. 1983. The Architecture of Cognition, Cambridge, MA: Harvard University Press.
Böhmová, A., Hajič, J., Hajičová, E., and Hladká, B. 2003. The Prague Dependency Treebank: A Three-Level Annotation Scenario, in Abeillé, A. (ed.) 2003. 103-127.
Fang, A. 1996. The Survey Parser, Design and Development. In Greenbaum, S. (ed.) 1996. 142-160.
Greenbaum, S., and Ni, Y. 1996. About the ICE Tagset. In Greenbaum, S. (ed.) 1996. 92-109.
Greenbaum, S. (ed.) 1996 Comparing English Worldwide. Oxford: Clarendon.
Huddleston, R. and Pullum, G.K. (eds.) 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press.
Karlsson, F., Voutilainen, A., Heikkilä, J., and Antilla, A., (eds.) 1995. Constraint Grammar: a language-independent system for parsing unrestricted text. Berlin: Mouton de Gruyter.
Marcus, M., Marcinkiewicz, M.A. and Santorini, B. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19:2, 313-330.
Moreno, A., López, S., Sánchez, F., and Grishman, R. 2003. Developing a Spanish Treebank, in Abeillé, A. (ed.) 2003. 149-163.
Nelson, G., Wallis, S.A. and Aarts, B. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins.
Oflazer, K., Say, B., Hakkani-Tür, D.Z. and Tür, G. 2003, Building a Turkish Treebank, in Abeillé, A. (ed.) 2003. 261-277.
Quirk, R., Greenbaum, S., Leech, G., and Svartvik J. 1985. A Comprehensive Grammar of the English Language. London: Longman.
Sinclair, J.M. 1987. Grammar in the Dictionary. In Sinclair, J.M. (ed.) 1987. Looking Up: an account of the COBUILD Project in lexical computing. London: Collins.
Wallis, S.A. 2003. Completing parsed corpora: from correction to evolution. In Abeille, A. (ed.). 61-71.
Wallis, S.A. 2008. Searching treebanks and other structured corpora. Chapter 34 in Lüdeling, A. and Kytö, M. (ed.) 2008. Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter. 738-759.
Wallis, S.A. and Nelson, G. 1997. Syntactic parsing as a knowledge acquisition problem. Proceedings of 10th European Knowledge Acquisition Workshop. Berlin: Springer Verlag. 285-300.