## Logistic regression with Wilson intervals

### Introduction

Back in 2010 I wrote a short article on the logistic (‘S’) curve in which I described its theoretical justification, mathematical properties and relationship to the Wilson score interval. This observed two key points.

• We can map any set of independent probabilities p ∈ [0, 1] to a flat Cartesian space using the inverse logistic (‘logit’) function, defined as
• logit(p) ≡ log(p / 1 – p) = log(p) – log(1 – p),
• where ‘log’ is the natural logarithm and logit(p) ∈ [-∞, ∞].
• By performing this transformation
• the logistic curve in probability space becomes a straight line in logit space, and
• Wilson score intervals for p ∈ (0, 1) are symmetrical in logit space, i.e. logit(p) – logit(w⁻) = logit(w⁺) – logit(p). Logistic curve (k = 1) with Wilson score intervals for n = 10, 100.

## EDS Resources

This post contains the resources for students taking the UCL English Linguistics MA, all in one place.

## A methodological progression

### Introduction

One of the most controversial arguments in corpus linguistics concerns the relationship between a ‘variationist’ paradigm comparable with lab experiments, and a traditional corpus linguistics paradigm focusing on normalised word frequencies.

Rather than see these two approaches as diametrically opposed, we propose that it is more helpful to view them as representing different points on a methodological progression, and to recognise that we are often forced to compromise our ideal experimental practice according to the data and tools at our disposal.

Viewing these approaches as being represented along a progression allows us to step back from any single perspective and ask ourselves how different results can be reconciled and research may be improved upon. It allows us to consider the potential value in performing more computer-aided manual annotation — always an arduous task — and where such annotation effort would be usefully focused.

The idea is sketched in the figure below.