Impossible logistic multinomials

Introduction

Recently, a number of linguists have begun to question the wisdom of assuming that linguistic change tends to follow an ‘S-curve’ or more properly, logistic, pattern. For example, Nevalianen (2015) offers a series of empirical observations that show that whereas data sometimes follows a continuous ‘S’, frequently this does not happen. In this short article I try to explain why this result should not be surprising.

The fundamental assumption of logistic regression is that a probability representing a true fraction, or share, of a quantity undergoing a continuous process of change by default follows a logistic pattern. This is a reasonable assumption in certain limited circumstances because an ‘S-curve’ is mathematically analogous to a straight line (cf. Newton’s first law of motion).

Regression is a set of computational methods that attempts to find the closest match between an observed set of data and a function, such as a straight line, a polynomial, a power curve or, in this case, an S-curve. We say that the logistic curve is the underlying model we expect data to be matched against (regressed to). In another post, I comment on the feasibility of employing Wilson score intervals in an efficient logistic regression algorithm.

We have already noted that change is assumed to be continuous, which implies that the input variable (x) is real and linear, such as time (and not e.g. probabilistic). In this post we discuss different outcome variable types. What are the ‘limited circumstances’ in which logistic regression is mathematically coherent?

  • We assume probabilities are free to vary from 0 to 1.
  • The envelope of variation must be constant, i.e. it must always be possible for an observed probability to reach 1.

Taken together this also means that probabilities are Binomial, not multinomial. Let us discuss what this implies.

Binomials

The logistic curve can be expressed as the function

P = logistic(x, m, k) ≡ 1 / (1 + em(xk)).

In a simple Binomial alternation, we have two probabilities, P and Q, where Q = 1 – P (as this is an expected model, rather than observed data, I am following the convention of capitalisation used elsewhere in this blog).

Q = 1 – logistic(x, m, k) = 1 – {1 / (1 + em(xk))} = 1 / (1 + e+m(xk)) = logistic(x, –m, k).

Another way of saying this is that

logistic(x, m, k) + logistic(x, –m, k) ≡ 1.

One curve goes up, the other goes down.

The logistic model assumes one degree of freedom.

Impossible multinomials

However, if the number of alternating types increase above 2, not all forms can follow the logistic curve. The only solution to the following equivalence is where the number of types t = 2 (identified above).

Σ
i=1..t
logistic(x, mi, ki) = 1.

What this means is that if you have an outcome variable with three or more types, plotted over time (say), you cannot expect these outcome probabilities to follow an S-curve, because this would be mathematically impossible!

The following graph is from a paper on the to-infinitive perfect, i.e. verb patterns ‘V to have V(ed)’, e.g. claims to have achieved. Subdividing the preceding verb into semantic classes where alternation would be possible (replacing CLAIM with SAY, for instance), we obtained graphs like the following.

multi-nonlogistic 2
A multinomial pattern of change is not expected to follow a logistic pattern. Probabilities of selecting lemmas of cognition and saying preceding a to-infinitive perfect, COHA (1820-2000), data from Bowie and Wallis (2016).

We might be able to argue that SAY or KNOW approximates to a logistic curve, but what can we say about REPORT, which rises and falls over the time period? (This rise and fall is statistically significant, as the confidence intervals indicate.)

In fact, as Nevalainen also points out, results like this should not be surprising, and we should not feel obliged to ‘explain’ them as a defect in the data.

Note: In the paper, to allow us to contrast and subdivide patterns of change, we consider probabilities against a global baseline of all potential to-infinitive perfect forms, below. (‘Total’ below is equivalent to p = 1 above.) This presentation tends to downplay the variation within the set of forms but the point remains: REPORT is not behaving logistically!

multi-nonlogistic
A multinomial pattern of change is not expected to follow a logistic pattern. Probabilities of selecting lemmas of cognition and saying preceding a to-infinitive perfect, COHA (1820-2000), from Bowie and Wallis (2016).

Possible multinomials

Given that we cannot expect three-way (and above) alternation patterns to adopt a logistic curve, this begs a question. What kinds of patterns might we expect?

One possibility is that we witness a hierarchical alternation. Consider an alternation {a, {b, c}}, where a alternates with the pair b+c, and b independently alternates with c. Type a may be the modal MUST, whereas b and c might correspond to HAVE to and HAVE got to respectively.

  • Since a alternates with b+c, we can assume p(a | {a, b, c}) follows a logistic curve over time (say).
  • Since b alternates with c, p(b | {b, c}) also adopts a logistic curve over the same axis. But note the different baseline: b is alternating within the envelope of variation defined by the remainder 1 – p(a | {a, b, c}).

The following graph is plotted for x = 0 to 20, with constant k = 10 in both cases, so the point at which the probability is 0.5 coincides for both a and b. Selecting ma = -0.5 and mb = -0.25 (a gentler slope), obtains the following hill-shaped curve against the global baseline.

If you want to understand more closely how this works and wish to experiment with settings, download this spreadsheet. For simplicity, the turning point, or 0.5-intercept, k, is the same in both cases, although this is not necessary.

Strict hierarchical alternation, synchronised turning constant (k), ma = -0.5, mb = 0.25, probability scale.
Strict hierarchical alternation, synchronised turning constant (k = 10), ma = -0.5, mb = -0.25, probability scale.

This ‘hill’ is obviously not a logistic curve, although it has a well-defined relationship with the logistic function. It is a logistic curve within the envelope of variation defined by the remainder — the area above the blue dotted line. (The red dotted line is not plotted against the same baseline, so in practice we would not observe this directly unless we reformulated the experiment.)

We only see that the hill-shaped curve is logistic when we change the baseline to the pair {b, c}. Altering the experimental design allows us to witness the alternation as a simple substitution of one form for another uninfluenced by other factors.

Hint: This observation means that when performing logistic regression, it is worth experimenting with hierarchically-structuring variables into type pairs that may plausibly alternate independently from other forms. But it is also possible that alternation may be more than two-way and data does not fit a hierarchical ‘binary tree’ model.

On a logit scale, the same curves look like this. Straight lines on this scale match logistic curves. The green curve does not!

Strict hierarchical alternation, synchronised turning constant (k), ma = -0.5, mb = 0.25, logit scale.
The same strict hierarchical alternation on a logit scale.

The three-body problem

As it is not possible for all three types to simultaneously adopt a logistic curve, an observer examining at least one of the patterns will see a non-logistic curve. But this only works if the three types might alternate hierarchically. However, with three or more types in competition, there is no reason why any particular type will follow a logistic curve.

Note how we snuck in the word ‘independent’ earlier — we assumed that the transition in preferred use from b to c (HAVE to and HAVE got to) would occur completely in isolation from the other substitution (for a, MUST).

In practice this pattern of neat independent alternation is probably unlikely. For exactly the same reason that the result of zooming in on a part of a curve might appear to be a straight line, just because a function matches a logistic curve over a small range does not mean that the overall pattern of change is truly logistic.

The so-called ‘three-body problem’ is a well-known fundamental problem in studying physical dynamic systems, and frequently results in bounded chaotic outcomes (Gleick 1977). This blog is not the place to discuss chaos theory, except to note that non-linear behaviour (‘chaos’) arises when three bodies are continuously influencing each other, such as two moons orbiting a planet, or one moon orbiting two planets.

chaos
A simple illustration of the three-body problem obtaining irregular non-linear behaviour (‘chaos’). The lower system is sometimes termed ‘the strange attractor’. This also has the property that infinitesimal differences in initial conditions lead to different outcomes.

In our case, the equivalent situation is where three or more alternating types exist.

We should not therefore be surprised if results do not converge to a neat logistic curve but some other pattern. Moreover, this is also true for a study that only examines the alternation of two simple types. In other words, even if the possibility of using MUST is ignored in a study of HAVE [got] to, we cannot guarantee that if the proportion of cases of MUST changes substantially over a given period, this will not influence the HAVE [got] to alternation so as to alter the shape of the curve.

This final conclusion is a reasonable ‘ecological’ objection to an over-reliance on logistic regression. It also means we should be careful about arguments for or against a particular baseline of study simply on the basis of r² scores (measures of fit to a logistic or other line).

This issue is a problem of fitting to a particular line. It does not undermine the proper use of confidence intervals or testing for significant differences between observations. In addition to a caution against input axes that are non-linear, we might therefore extend the ‘limited circumstances’ identified above:

  • We assume probabilities are free to vary from 0 to 1.
  • The envelope of variation must be constant, i.e. it must always be possible for an observed probability to reach 1.
  • The alternation is not influenced by other alternating forms or otherwise be subject to systematic ecological pressures.

References

Bowie, J. and Wallis, S.A. 2016. The to-infinitival perfect: A study of decline. In Werner, V., Seoane, E., and Suárez-Gómez, C. (eds.) Re-assessing the Present Perfect, Topics in English Linguistics (TiEL) 91. Berlin: De Gruyter, 43-94.

Gleick, J. 1977. Chaos: Making a New Science, London: Heineman.

Nevalainen, T. 2015. Descriptive adequacy of the S-curve model in diachronic studies of language change. In Sanchez-Stockhammer, C. (ed.) Can we predict language change? Helsinki: Varieng, UoH. » ePublished

See also

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.