Comparing frequencies within a discrete distribution

This page explains how to compare observed frequencies f₁ and f₂ from the same distributionF = {f₁, f₂,…}. To compare observed frequencies f₁ and f₂ from different distributions, i.e. where F₁ = {f₁,…} and F₂ = {f₂,…}, you need to use a chi-square or Newcombe-Wilson test.


In a recent study, my colleague Jill Bowie obtained a discrete frequency distribution by manually classifying cases in a small sample drawn from a large corpus.

Jill converted this distribution into a row of probabilities and calculated Wilson score intervals on each observation, to express the uncertainty associated with a small sample. She had one question, however:

How do we know whether the proportion of one quantity is significantly greater than another?

We might use a Newcombe-Wilson test (see Wallis 2013a), but this test assumes that we want to compare samples from independent sources. Jill’s data are drawn from the same sample, and all probabilities must sum to 1. Instead, the optimum test is a dependent-sample test.


A discrete distribution looks something like this: F = {108, 65, 6, 2}. This is the frequency data for the middle column (circled) in the following chart.

This may be converted into a probability distribution P, representing the proportion of examples in each category, by simply dividing by the total: P = {0.60, 0.36, 0.03, 0.01}, which sums to 1.

We can plot these probabilities, with Wilson score intervals, as shown below.


An example graph plot showing the changing proportions of meanings of the verb think over time in the US TIME Magazine Corpus, with Wilson score intervals, after Levin (2013). In this post we discuss the 1960s data (circled). The sum of each column probability is 1. Many thanks to Magnus for the data!

So how do we know if one proportion is significantly greater than another?

  • When comparing values diachronically (horizontally), data is drawn from independent samples. We may use the Newcombe-Wilson test, and employ the handy visual rule that if intervals do not overlap they must be significantly different.
  • However, probabilities drawn from the same sample (vertically) sum to 1 — which is not the case for independent samples! There are k−1 degrees of freedom, where k is the number of classes. It turns out that the relevant significance test we need to use is an extremely basic test, but it is rarely discussed in the literature.

Continue reading