### Introduction

Elsewhere in this blog we introduce the concept of statistical significance by considering the reliability of a single sampled observation of a Binomial proportion: an estimate of the probability of selecting an item in the future. This allows us to develop an understanding of the likely distribution of what the true value of that probability in the population might be. In short, were we to make future observations of that item, we could expect that each sampled probability would be found within a particular range – a confidence interval – a fixed proportion of times, such as 1 in 20 or 1 in 100. This ‘fixed proportion’ is termed the error level because we predict that the true value will be outside the range 1 in 20 or 1 in 100 times.

This process of inferring about future observations is termed ‘inferential statistics’. Our approach is to build our understanding in a series of stages based on confidence intervals about the single proportion. Here we will approach the same question by deconstructing the chi-square test.

A core idea of statistical inference is this: randomness is a fact of life. If you sample the same phenomenon multiple times, drawing on different data each time, it is unlikely that the observation will be identical, or – to put it in terms of an observed sample – it is unlikely that the mean value of the observation will be the same. But you are more likely than not to find the new mean near the original mean, and the larger the size of your sample, the more reliable your estimate will be. This, in essence, is the Central Limit Theorem.

This principle applies to the central tendency of data, usually the arithmetic mean, but occasionally a median. It does not concern outliers: extreme but rare events (which, by the way, you should include, and not delete, from your data).

We are mainly concerned with Binomial or Multinomial proportions, i.e. the fraction of cases sampled which have a particular property. A Binomial proportion is a statement about the sample, a simple fraction *p* = *f* / *n*. But it is also the sample mean probability of selecting a value. Suppose we selected a random case from the sample. In the absence of any other knowledge about that case, the average chance that *X* = *x*₁ is also *p*.

The same principle applies to the mean of Real or Integer values, for which one might use Welch’s or Student’s *t* test, and the median rank of Ordinal data, for which a Mann-Whitney *U* test may be appropriate.

With this in mind, we can form an understanding of significance, or to be precise, significant difference. The ‘difference’ referred to here is the difference between an uncertain observed value and a predicted or known population value, *d* = *p* – *P*, or the difference between two uncertain observed values, *d* = *p*₂ – *p*₁. The first of these differences is found in a single-sample *z* test, the second in a two-sample *z* test. See Wallis (2013b).

A significance test is created by comparing an observed difference with a second element, a *critical threshold* extrapolated from the underlying statistical model of variation. Continue reading