An unnatural probability?

Not everything that looks like a probability is.

Just because a variable or function ranges from 0 to 1, it does not mean that it behaves like a unitary probability over that range.

Natural probabilities

What we might term a natural probability is a proper fraction of two frequencies, which we might write as p = n.

  • Provided that f can be any value from 0 to n, p can range from 0 to 1.
  • In this formula, f and n must also be natural frequencies, that is, n stands for the size of the set of all cases, and f the size of a true subset of these cases.

This natural probability is expected to be a Binomial variable, and the formulae for z tests, χ² tests, Wilson intervals, etc., as well as logistic regression and similar methods, may be legitimately applied to such variables. The Binomial distribution is the expected distribution of such a variable if each observation is drawn independently at random from the population (an assumption that is not strictly true with corpus data).

Another way of putting this is that a Binomial variable expresses the number of individual events of Type A in a situation where an outcome of either A or B are possible. If we observe, say that 8 out of 10 cases are of Type A, then we can say we have an observed probability of A being chosen, p(A | {A, B}), of 0.8. In this case, f is the frequency of A (8), and n the frequency of both A and B (10). See Wallis (2013a). Continue reading “An unnatural probability?”