Inferential statistics – and other animals

Introduction

Inferential statistics is a methodology of extrapolation from data. It rests on a mathematical model which allows us to predict values in the population based on observations in a sample drawn from that population.

Central to this methodology is the idea of reporting not just the observation itself but also the certainty of that observation. In some cases we can observe the population directly and make statements about it.

  • We can cite the 10 most frequent words in Shakespeare’s First Folio with complete certainty (allowing for spelling variations). Such statements would simply be facts.
  • Similarly, we could take a corpus like ICE-GB and report that in it, there are 14,275 adverbs ending in -ly out of 1,061,263 words.

Provided that we limit the scope of our remarks to the corpus itself, we do not need to worry about degrees of certainty because these statements are simply facts. Statements about the corpus are sometimes called descriptive statistics (the word statistic here being used in its most general sense, i.e. a number). Continue reading