### Introduction

The fundamental assumption of logistic regression is that a probability representing a true fraction, or share, of a quantity undergoing a continuous process of change would by default follow a logistic or ‘S-curve’ pattern. This is a reasonable assumption in certain limited circumstances because it is analogous to a straight line (cf. Newton’s first law of motion).

Regression is a set of computational methods that attempts to find the closest match between an observed set of data and a function, such as a straight line, a polynomial, a power curve or, in this case, an S-curve. We say that the logistic curve is the underlying model we expect data to be matched against (regressed to). In another post, I comment on the feasibility of employing Wilson score intervals in an efficient logistic regression algorithm.

We have already noted that change is assumed to be continuous, which implies that the input variable (*x*) is **real and linear**, such as time (and not e.g. probabilistic). In this post we discuss different outcome variable types. What are the ‘limited circumstances’ in which logistic regression is mathematically coherent?

- We assume probabilities are free to vary from 0 to 1.
- The envelope of variation must be constant, i.e. it must always be possible for an observed probability to reach 1.

Taken together this also means that probabilities are Binomial, not multinomial. Let us discuss what this means. Continue reading