An unnatural probability?

Not everything that looks like a probability is one.

Just because a variable or function ranges from 0 to 1, it does not mean that it behaves like a unitary probability over that range.

Natural probabilities

What we might term a natural probability is a proper fraction of two frequencies, which we might write as p = f / n.

  • Provided that f can be any value from 0 to n, p can range from 0 to 1.
  • In this formula, f and n must also be natural frequencies, that is, n stands for the size of the set of all cases, and f the size of a true subset of these cases. The term ‘natural’ here refers to the mathematical sense of the set of positive integers.

Aside: In certain models, these frequencies could be obtained from the sum of a set of probability estimates, each representing the probability that the observation was genuinely independent from others in the sample. This might permit a ‘frequency’ to be observed that was not a natural number. But the principle is the same.

This natural probability is expected to be a Binomial variable, and the formulae for z tests, χ² tests, Wilson intervals, etc., as well as logistic regression and similar methods, may be legitimately applied to such variables. The Binomial distribution is the expected distribution of such a variable if each observation is drawn independently at random from the population (an assumption that is not strictly true with corpus data).

Another way of putting this is that a Binomial variable expresses the number of individual events of Type A in a situation where an outcome of either A or B are possible. If we observe, say that 8 out of 10 cases are of Type A, then we can say we have an observed probability of A being chosen, p(A | {A, B}), of 0.8. In this case, f is the frequency of A (8), and n the frequency of both A and B (10). See Wallis (2013a). Continue reading “An unnatural probability?”

Freedom to vary and significance tests

Introduction

Statistical tests based on the Binomial distribution (z, χ², log-likelihood and Newcombe-Wilson tests) assume that the item in question is free to vary at each point. This simply means that

  • If we find f items under investigation (what we elsewhere refer to as ‘Type A’ cases) out of n potential instances of the item, the statistical model of inference assumes that it must be possible for f to be any number from 0 to n, which we can write as f ∈ [0, n].
  • Both observed proportion p = f / n and population proportion P are therefore expected to fall in the probabilistic range, P = [0, 1].

Note that this constraint is a mathematical one. All we are asserting is that the true proportion in the population P (and thus the probability of randomly selecting the item) could conceivably range from 0 to 1.

This property is not limited to onomasiological studies which require strict alternation of types with constant meaning. In semasiological studies, where we evaluate alternative meanings of the same word, these tests may also be used. This is because it is conceivable that the proportion of a meaning in a sample can be any value from 0 to 1.

However, it is also common in corpus linguistics to see evaluations carried out against a baseline that contains terms that simply cannot plausibly be exchanged with the item under investigation. The most obvious example is statements of the following type: “linguistic item x increases per million words between category 1 and 2”, with reference to a log-likelihood or χ² significance test to justify this claim. Rarely is this appropriate.

Some terminology: We will use Types A and B to represent validly alternative items. So if Type A is the use of modal shall, most words will not alternate with shall. Type B cases would be those that can alternate with shall (e.g. modal will in certain contexts).

The remainder of cases (other words) are, for the purposes of our study, not evaluated. We will term these invariant cases Type C, because they cannot replace Type A or Type B.

In That vexed problem of choice we explained that introducing such ‘Type C’ cases into an experimental design conflates opportunity and choice, causing the potential range of proportions to be less than 1, and worse still, often to vary between subsamples.

In this post we make a further observation. This methodological error also makes the statistical evaluation of variation more conservative. Not only may we mistake a change in opportunity as a change in the preference for the item, but we also weaken the power of statistical tests and tend to reject significant changes (in statistics jargon, “Type II errors”).

This problem of experimental design far outweighs differences between methods for computing statistical tests. Continue reading “Freedom to vary and significance tests”

Testing tests

Note:
This article describes work undertaken in the preparation of Wallis (2013). Additional work in this direction, performed in preparation for the publication of Wallis (2021), is described in Further evaluation of Binomial confidence intervals.

Introduction

Over the last few months I have been looking at computationally evaluating confidence intervals and significance tests. This process has helped me sharpen up the recommendations I can give to researchers. I have updated some online papers and blog posts as a result.

This analysis has exposed a difference, rarely commented upon, between the optimum test for contingency (“χ²-type”) tests when independent variable samples are drawn from the same population or independent populations embodying a known difference D.

Confidence intervals and significance tests are closely related, for reasons discussed here. So if we can evaluate a formula for a confidence interval in some way, then we can also potentially evaluate the test. Continue reading “Testing tests”