EDS Resources

This post contains the resources for students taking the UCL English Linguistics MA, all in one place.

Session 15: Introduction to statistics

Sessions 18 and 19: Statistics Workshops

Suggested further reading

Verb Phrase book published

Why this book?

book coverThe grammar of English is often thought to be stable over time. However a new book, edited by Bas Aarts, Joanne Close, Geoffrey Leech and Sean Wallis, The Verb Phrase in English: investigating recent language change with corpora (Cambridge University Press, 2013) presents a body of research from linguists that shows that using natural language corpora one can find changes within a core element of grammar, the Verb Phrase, over a span of decades rather than centuries.

The book draws from papers first presented at a symposium on the verb phrase organised for the Survey of English Usage’s 50th anniversary and on research from the Changing English Verb Phrase project.

Continue reading

Capturing patterns of linguistic interaction

Abstract Full Paper (PDF)

Numerous competing grammatical frameworks exist on paper, as algorithms and embodied in parsed corpora. However, not only is there little agreement about grammars among linguists, but there is no agreed methodology for demonstrating the benefits of one grammar over another. Consequently the status of parsed corpora or ‘treebanks’ is suspect.

The most common approach to empirically comparing frameworks is based on the reliable retrieval of individual linguistic events from an annotated corpus. However this method risks circularity, permits redundant terms to be added as a ‘solution’ and fails to reflect the broader structural decisions embodied in the grammar. In this paper we introduce a new methodology based on the ability of a grammar to reliably capture patterns of linguistic interaction along grammatical axes. Retrieving such patterns of interaction does not rely on atomic retrieval alone, does not risk redundancy and is no more circular than a conventional scientific reliance on auxiliary assumptions. It is also a valid experimental perspective in its own right.

We demonstrate our approach with a series of natural experiments. We find an interaction captured by a phrase structure analysis between attributive adjective phrases under a noun phrase with a noun head, such that the probability of adding successive adjective phrases falls. We note that a similar interaction (between adjectives preceding a noun) can also be found with a simple part-of-speech analysis alone. On the other hand, preverbal adverb phrases do not exhibit this interaction, a result anticipated in the literature, confirming our method.

Turning to cases of embedded postmodifying clauses, we find a similar fall in the additive probability of both successive clauses modifying the same NP and embedding clauses where the NP head is the most recent one. Sequential postmodification of the same head reveals a fall and then a rise in this additive probability. Reviewing cases, we argue that this result can only be explained as a natural phenomenon acting on language production which is expressed by the distribution of cases on an embedding axis, and that this is in fact empirical evidence for a grammatical structure embodying a series of speaker choices.

We conclude with a discussion of the implications of this methodology for a series of applications, including optimising and evaluating grammars, modelling case interaction, contrasting the grammar of multiple languages and language periods, and investigating the impact of psycholinguistic constraints on language production.

Continue reading

Reciprocating the Wilson interval


How can we calculate confidence intervals on a property like sentence length (as measured by the number of words per sentence)?

You might want to do this to find out whether or not, say, spoken utterances consist of shorter or longer sentences than those found in writing.

The problem is that the average number of words per sentence is not a probability. If you think about it, this ratio will (obviously) equal or exceed 1. So methods for calculating intervals on probabilities won’t work without recalibration.

Aside: You are most likely to hit this type of problem if you want to plot a graph of some non-probabilistic property, or you wish to cite a property with an upper and lower bound for some reason. Sometimes expressing something as a probability does not seem natural. However, it is a good discipline to think in terms of probabilities, and to convert your hypotheses into hypotheses about probabilities as far as possible. As we shall see, this is exactly what you have to do to apply the Wilson score interval.

Note also that just because you want to calculate confidence intervals on a property, you also have to consider whether the property is freely varying when expressed as a probability.

The Wilson score interval (w⁻, w⁺), is a robust method for computing confidence intervals about probabilistic observations p.

Elsewhere we saw that the Wilson score interval obtained an accurate approximation to the ‘exact’ Binomial interval based on an observed probability p, obtained by search. It is also well-constrained, so that neither upper nor lower bound can exceed the probabilistic range [0, 1].

But the Wilson interval is based on a probability. In this post we discuss how this method can be used for other quantities.

Continue reading

Why ‘Wald’ is Wrong: once more on confidence intervals


The idea of plotting confidence intervals on data, which is discussed in a number of posts elsewhere on this blog, should be straightforward. Everything we observe is uncertain, but some things are more certain than others! Instead of marking an observation as a point, its better to express it as a ‘cloud’, an interval representing a range of probabilities.

But the standard method for calculating intervals that most people are taught is wrong.

The reasons why are dealt with in detail in (Wallis 2013). In preparing this paper for publication, however, I came up with a new demonstration, using real data, as to why this is the case.

Continue reading

Change and certainty: plotting confidence intervals (2)


In a previous post I discussed how to plot confidence intervals on observed probabilities. Using this method we can create graphs like the following. (Data is in the Excel spreadsheet we used previously: for this post I have added a second worksheet.)

The graph depicts both the observed probability of a particular form and the certainty that this observation is accurate. The ‘I’-shaped error bars depict the estimated range of the true value of the observation at a 95% confidence level (see Wallis 2013 for more details).

A note of caution: these probabilities are semasiological proportions (different uses of the same word) rather than onomasiological choices (see Choice vs. use).

An example graph plot showing the changing proportions of meanings of the verb think over time in the US TIME Magazine Corpus, with Wilson score intervals, after Levin (2013). Many thanks to Magnus for the data!

In this post I discuss ways in which we can plot intervals on changes (differences) rather than single probabilities.

The clearer our visualisations, the better we can understand our own data, focus our explanations on significant results and communicate our results to others. Continue reading

Binomial confidence intervals and contingency tests

Abstract Paper (PDF)

Many statistical methods rely on an underlying mathematical model of probability which is based on a simple approximation, one that is simultaneously well-known and yet frequently poorly understood.

This approximation is the Normal approximation to the Binomial distribution, and it underpins a range of statistical tests and methods, including the calculation of accurate confidence intervals, performing goodness of fit and contingency tests, line-and model-fitting, and computational methods based upon these. What these methods have in common is the assumption that the likely distribution of error about an observation is Normally distributed.

The assumption allows us to construct simpler methods than would otherwise be possible. However this assumption is fundamentally flawed.

This paper is divided into two parts: fundamentals and evaluation. First, we examine the estimation of error using three approaches: the ‘Wald’ (Normal) interval, the Wilson score interval and the ‘exact’ Clopper-Pearson Binomial interval. Whereas the first two can be calculated directly from formulae, the Binomial interval must be approximated towards by computational search, and is computationally expensive. However this interval provides the most precise significance test, and therefore will form the baseline for our later evaluations.

We consider two further refinements: employing log-likelihood in computing intervals (also requiring search) and the effect of adding a correction for the transformation from a discrete distribution to a continuous one.

In the second part of the paper we consider a thorough evaluation of this range of approaches to three distinct test paradigms. These paradigms are the single interval or 2 × 1 goodness of fit test, and two variations on the common 2 × 2 contingency test. We evaluate the performance of each approach by a ‘practitioner strategy’. Since standard advice is to fall back to ‘exact’ Binomial tests in conditions when approximations are expected to fail, we simply count the number of instances where one test obtains a significant result when the equivalent exact test does not, across an exhaustive set of possible values.

We demonstrate that optimal methods are based on continuity-corrected versions of the Wilson interval or Yates’ test, and that commonly-held assumptions about weaknesses of χ² tests are misleading.

Log-likelihood, often proposed as an improvement on χ², performs disappointingly. At this level of precision we note that we may distinguish the two types of 2 × 2 test according to whether the independent variable partitions the data into independent populations, and we make practical recommendations for their use.


Estimating the error in an observation is the first, crucial step in inferential statistics. It allows us to make predictions about what would happen were we to repeat our experiment multiple times, and, because each observation represents a sample of the population, predict the true value in the population (Wallis 2013).

Consider an observation that a proportion p of a sample of size n is of a particular type.

For example

  • the proportion p of coin tosses in a set of n throws that are heads,
  • the proportion of light bulbs p in a production run of n bulbs that fail within a year,
  • the proportion of patients p who have a second heart attack within six months after a drug trial has started (n being the number of patients in the trial),
  • the proportion p of interrogative clauses n in a spoken corpus that are finite.

We have one observation of p, as the result of carrying out a single experiment. We now wish to infer about the future. We would like to know how reliable our observation of p is without further sampling. Obviously, we don’t want to repeat a drug trial on cardiac patients if the drug may be adversely affecting their survival.

Continue reading