Recently, a number of linguists have begun to question the wisdom of assuming that linguistic change tends to follow an ‘S-curve’ or more properly, logistic, pattern. For example, Nevalianen (2015) offers a series of empirical observations that show that whereas data sometimes follows a continuous ‘S’, frequently this does not happen. In this short article I try to explain why this result should not be surprising.
The fundamental assumption of logistic regression is that a probability representing a true fraction, or share, of a quantity undergoing a continuous process of change by default follows a logistic pattern. This is a reasonable assumption in certain limited circumstances because an ‘S-curve’ is mathematically analogous to a straight line (cf. Newton’s first law of motion).
Regression is a set of computational methods that attempts to find the closest match between an observed set of data and a function, such as a straight line, a polynomial, a power curve or, in this case, an S-curve. We say that the logistic curve is the underlying model we expect data to be matched against (regressed to). In another post, I comment on the feasibility of employing Wilson score intervals in an efficient logistic regression algorithm.
We have already noted that change is assumed to be continuous, which implies that the input variable (x) is real and linear, such as time (and not e.g. probabilistic). In this post we discuss different outcome variable types. What are the ‘limited circumstances’ in which logistic regression is mathematically coherent?
- We assume probabilities are free to vary from 0 to 1.
- The envelope of variation must be constant, i.e. it must always be possible for an observed probability to reach 1.
Taken together this also means that probabilities are Binomial, not multinomial. Let us discuss what this implies. Continue reading