Logistic regression with Wilson intervals

Introduction

Back in 2010 I wrote a short article on the logistic (‘S’) curve in which I described its theoretical justification, mathematical properties and relationship to the Wilson score interval. This observed two key points.

  • We can map any set of independent probabilities p ∈ [0, 1] to a flat Cartesian space using the inverse logistic (‘logit’) function, defined as
    • logit(p) ≡ log(p / 1 – p) = log(p) – log(1 – p),
    • where ‘log’ is the natural logarithm and logit(p) ∈ [-∞, ∞].
  • By performing this transformation
    • the logistic curve in probability space becomes a straight line in logit space, and
    • Wilson score intervals for p ∈ (0, 1) are symmetrical in logit space, i.e. logit(p) – logit(w⁻) = logit(w⁺) – logit(p).
Logistic curve (k = 1) with Wilson score intervals for n = 10, 100.

Logistic curve (k = 1) with Wilson score intervals for n = 10, 100.

Continue reading

Binomial algorithm snippets

Introduction

Elsewhere on this blog I summarise an analysis of the performance of a broad range of different confidence interval calculations and 2 × 2 contingency tests against equivalent ‘exact’ Binomial tests calculated from first principles.

For transparency, it is necessary to show how I went about computing these results.

Many of these algorithms are summarised in mathematical terms in this paper. However, for those who wish to recreate the computation, here is the code in the programming language C.

Warning: colleagues have pointed out that this post is not for the faint hearted!

Continue reading