Back in 2010 I wrote a short article on the logistic (‘S’) curve in which I described its theoretical justification, mathematical properties and relationship to the Wilson score interval. This observed two key points.
We can map any set of independent probabilities p ∈ [0, 1] to a flat Cartesian space using the inverse logistic (‘logit’) function, defined as
logit(p) ≡ log(p / 1 – p) = log(p) – log(1 – p),
where ‘log’ is the natural logarithm and logit(p) ∈ [-∞, ∞].
By performing this transformation
the logistic curve in probability space becomes a straight line in logit space, and
Wilson score intervals for p ∈ (0, 1) are symmetrical in logit space, i.e. logit(p) – logit(w⁻) = logit(w⁺) – logit(p).
One of the most controversial arguments in corpus linguistics concerns the relationship between a ‘variationist’ paradigm comparable with lab experiments, and a traditional corpus linguistics paradigm focusing on normalised word frequencies.
Rather than see these two approaches as diametrically opposed, we propose that it is more helpful to view them as representing different points on a methodological progression, and to recognise that we are often forced to compromise our ideal experimental practice according to the data and tools at our disposal.
Viewing these approaches as being represented along a progression allows us to step back from any single perspective and ask ourselves how different results can be reconciled and research may be improved upon. It allows us to consider the potential value in performing more computer-aided manual annotation — always an arduous task — and where such annotation effort would be usefully focused.