Introduction Paper (PDF)
Often when we carry out research we wish to measure the degree to which one variable affects the value of another, setting aside the question as to whether this impact is sufficiently large as to be considered significant (i.e., significantly different from zero).
The most general term for this type of measure is size of effect. Effect sizes allow us to make descriptive statements about samples. Traditionally, experimentalists have referred to ‘large’, ‘medium’ and ‘small’ effects, which is rather imprecise. Nonetheless, it is possible to employ statistically sound methods for comparing different sizes of effect by estimating a Gaussian confidence interval (Bishop, Fienberg and Holland 1975) or by comparing pairs of contingency tables employing a “difference of differences” calculation (Wallis 2011).
In this paper we consider effect size measures for contingency tables of any size, generally referred to as “r × c tables”. This effect size is the “measure of association” or “measure of correlation” between the two variables. There are more measures applying to 2 × 2 tables than for larger tables.
Consider Table 1 below. A and B are dichotomous (two-way or Boolean) variables. We wish to find the best estimate that the value of a ∈ A is dependent on the value of b ∈ B. We will refer to the ideal measure as the dependent probability of A given B, dp(A, B).
A | B | b_{1} | b_{2} | Total | |
a_{1} | 45 | 5 | 50 | ||
a_{2} | 15 | 35 | 50 | ||
Total | 60 | 40 | 100 | ||
A | B | b_{1} | b_{2} | Total | |
a_{1} | 50 | 0 | 50 | ||
a_{2} | 0 | 50 | 50 | ||
Total | 50 | 50 | 100 | ||
It follows that if the data were arranged as in Table 2, we could conclude that the value of A completely depended on the value of B, and therefore our ideal dependent probability would be 1. Note that if we took any instance from the sample where B = b_{1}, then A = a_{1} (and so forth).
This type of table is employed in conventional χ² tests of homogeneity (independence). Indeed, contingency tests may be considered as an assessment combining an observed effect size and the weight of evidence (total number of cases) supporting this observation.
Similar measures are used in other circumstances. Wallis (2012) discusses goodness of fit measures of association which measure the degree to which one categorical distribution correlates with another. Similarly, Pearson’s r² and Spearman’s R² are standard effect sizes (measures of correlation) for variables expressed as ratio (real numbers) and ordinal (ranked) data respectively.
However, with categorical data a multiplicity of alternate measures are available. Measures have often developed independently and their differences are rarely explored.
Several candidates for the size of the correlation (or association) between discrete variables have been suggested in the literature. These include the contingency coefficient C, Yule’s Q and odds ratio o (Sheskin 1997). We can eliminate the odds ratio and Yule’s Q from consideration because they only apply to 2 × 2 tables. The odds ratio is not probabilistic (it is the proportion between two probabilities), although the logistic function can be applied to obtain the log odds. In this paper we consider three potential candidates: Cramér’s φ, adjusted C and Bayesian dependency.
Given this range of potential measures for effect size, two questions arise.
Do they measure the same thing? And if not, how may we choose between them?
Contents
- Introduction
- Probabilistic approaches to dependent probability
- A Bayesian approach to dependent probability
- Evaluating measures
- Robustness and confidence intervals on φ
- A worked example
Citation
Wallis, S.A. 2012. Measures of association for contingency tables. London: Survey of English Usage, UCL. http://www.ucl.ac.uk/english-usage/statspapers/phimeasures.pdf
References
Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.
Sheskin, D.J. 1997. Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.
Wallis, S.A. 2011. Comparing χ² tests for separability. London: Survey of English Usage, UCL. » Post
Wallis, S.A. 2012. Goodness of fit measures for discrete categorical data. London: Survey of English Usage, UCL. » Post