In a previous post I discussed how to plot confidence intervals on observed probabilities. Using this method we can create graphs like the following. (Data is in the Excel spreadsheet we used previously: for this post I have added a second worksheet.)
The graph depicts both the observed probability of a particular form and the certainty that this observation is accurate. The ‘I’-shaped error bars depict the estimated range of the true value of the observation at a 95% confidence level (see Wallis 2013 for more details).
A note of caution: these probabilities are semasiological proportions (different uses of the same word) rather than onomasiological choices (see Choice vs. use).
In this post I discuss ways in which we can plot intervals on changes (differences) rather than single probabilities.
Often when we carry out research we wish to measure the degree to which one variable affects the value of another, setting aside the question as to whether this impact is sufficiently large as to be considered significant (i.e., significantly different from zero).
The most general term for this type of measure is size of effect. Effect sizes allow us to make descriptive statements about samples. Traditionally, experimentalists have referred to ‘large’, ‘medium’ and ‘small’ effects, which is rather imprecise. Nonetheless, it is possible to employ statistically sound methods for comparing different sizes of effect by inverting a Gaussian interval (Bishop, Fienberg and Holland 1975) or by comparing pairs of contingency tables employing a “difference of differences” calculation (Wallis 2019).
When we carry out experiments and perform statistical tests we have two distinct aims.
To form statistically robust conclusions about empirical data.
To make logically sound arguments about experimental conclusions.
Robustness is essentially an inductive mathematical or statistical issue.
Soundness is a deductive question of experimental design and reporting.
Robust conclusions are those that are likely to be repeated if another researcher were to come along and perform the same experiment with different data sampled in much the same way. Sound arguments distinguish between what we can legitimately infer from our data, and the hypothesis we may wish to test.