**Note:**

This page explains how to compare observed frequencies

*f*₁ and

*f*₂ from the same distribution,

**F**= {

*f*₁,

*f*₂,…}. To compare observed frequencies

*f*₁ and

*f*₂ from different distributions, i.e. where

**F₁**= {

*f*₁,…} and

**F₂**= {

*f*₂,…}, you need to use a chi-square or Newcombe-Wilson test.

### Introduction

In a recent study, my colleague Jill Bowie obtained a discrete frequency distribution by manually classifying cases in a small sample drawn from a large corpus.

Jill converted this distribution into a row of probabilities and calculated Wilson score intervals on each observation, to express the uncertainty associated with a small sample. She had one question, however:

**How do we know whether the proportion of one quantity is significantly greater than another?**

We might use a Newcombe-Wilson test (see Wallis 2013a), but this test assumes that we want to compare samples from independent sources. Jill’s data are drawn from the same sample, and all probabilities must sum to 1. Instead, the optimum test is a **dependent-sample** test.

### Example

A discrete distribution looks something like this: **F** = {108, 65, 6, 2}. This is the frequency data for the middle column (circled) in the following chart.

This may be converted into a probability distribution **P**, representing the proportion of examples in each category, by simply dividing by the total: **P** = {0.60, 0.36, 0.03, 0.01}, which sums to 1.

We can plot these probabilities, with Wilson score intervals, as shown below.

**So how do we know if one proportion is significantly greater than another?**

- When comparing values diachronically (horizontally), data is drawn from
**independent samples**. We may use the Newcombe-Wilson test, and employ the handy visual rule that if intervals do not overlap they must be significantly different. - However, probabilities drawn from the
**same sample**(vertically) sum to 1 — which is not the case for independent samples! There are*k−*1 degrees of freedom, where*k*is the number of classes. It turns out that the relevant significance test we need to use is an extremely basic test, but it is rarely discussed in the literature.