Are embedding decisions independent?

Evidence from preposition(al) phrases

Abstract Full Paper (PDF)

One of the more difficult challenges in linguistics research concerns detecting how constraints might apply to the process of constructing phrases and clauses in natural language production. In previous work (Wallis 2019) we considered a number of operations modifying noun phrases, including sequential and embedded modification with postmodifying clauses. Notably, we found a pattern of a declining additive probability for each decision to embed postmodifying clauses, albeit a pattern that differed in speech and writing.

In this paper we use the same research paradigm to investigate the embedding of an altogether simpler structure: postmodifying nouns with prepositional phrases. These are approximately twice as frequent and structures exhibit as many as five levels of embedding in ICE-GB (two more than are found for clauses). Finally the embedding model is simplified because only one noun phrase can be found within each prepositional phrase. We discover different initial rates and patterns for common and proper nouns, and certain subsets of pronouns and numerals. Common nouns (80% of nouns in the corpus) do appear to generate a secular decline in the additive probability of embedded prepositional phrases, whereas the equivalent rate for proper nouns rises from a low initial probability, a fact that appears to be strongly affected by the presence of titles.

It may be generally assumed that like clauses, prepositional phrases are essentially independent units. However, we find evidence from a number of sources that indicate that some double-layered constructions may be being added as single units. In addition to titles, these constructions include schematic or idiomatic expressions whose head is an ‘indefinite’ pronoun or numeral. Continue reading “Are embedding decisions independent?”

Coping with imperfect data

Introduction

One of the challenges for corpus linguists is that many of the distinctions that we wish to make are either not annotated in a corpus at all or, if they are represented in the annotation, unreliably annotated. This issue frequently arises in corpora to which an algorithm has been applied, but where the results have not been checked by linguists, a situation which is unavoidable with mega-corpora. However, this is a general problem. We would always recommend that cases be reviewed for accuracy of annotation.

A version of this issue also arises when checking for the possibility of alternation, that is, to ensure that items of Type A can be replaced by Type B items, and vice-versa. An example might be epistemic modal shall vs. will. Most corpora, including richly-annotated corpora such as ICE-GB and DCPSE, do not include modal semantics in their annotation scheme. In such cases the issue is not that the annotation is “imperfect”, rather that our experiment relies on a presumption that the speaker has the choice of either type at any observed point (see Aarts et al. 2013), but that choice is conditioned by the semantic content of the utterance.

Continue reading “Coping with imperfect data”

Is language really “a set of alternations?”

The perspective that the study of linguistic data should be driven by studies of individual speaker choices has been the subject of attack from a number of linguists.

The first set of objections have come from researchers who have traditionally focused on linguistic variation expressed in terms of rates per word, or per million words.

No such thing as free variation?

As Smith and Leech (2013) put it: “it is commonplace in linguistics that there is no such thing as free variation” and that indeed multiple differing constraints apply to each term. On the basis of this observation they propose an ‘ecological’ approach, although in their paper this approach is not clearly defined.

Continue reading “Is language really “a set of alternations?””