Measures of association for contingency tables

Note:
The original article published in 2012 (archived here) contains a ‘Wald’-type error in the calculation of confidence intervals on Cramér’s ϕ. The correct approach is explained in this blog post. Chapter 14 in Wallis (2021) has a shortened form of this article with the correct approach, plus a brief summary of (Wallis 2012).

Introduction Paper (PDF)

Often when we carry out research we wish to measure the degree to which one variable affects the value of another, setting aside the question as to whether this impact is sufficiently large as to be considered significant (i.e., significantly different from zero).

The most general term for this type of measure is size of effect. Effect sizes allow us to make descriptive statements about samples. Traditionally, experimentalists have referred to ‘large’, ‘medium’ and ‘small’ effects, which is rather imprecise. Nonetheless, it is possible to employ statistically sound methods for comparing different sizes of effect by inverting a Gaussian interval (Bishop, Fienberg and Holland 1975) or by comparing pairs of contingency tables employing a “difference of differences” calculation (Wallis 2019).

In this paper we consider effect size measures for contingency tables of any size, generally referred to as “r × c tables”. This effect size is the “measure of association” or “measure of correlation” between the two variables. There are more measures applying to 2 × 2 tables than for larger tables.

Consider Table 1 below. A and B are dichotomous (two-way or Boolean) variables. We wish to find the best estimate that the value of aA is dependent on the value of bB. We will refer to the ideal measure as the dependent probability of A given B, dp(A, B).

A B b1 b2 Total
a1 45 5 50
a2 15 35 50
Total 60 40 100

Table 1. An example 2 × 2 contingency table for two dichotomous variables A, B.

A B b1 b2 Total
a1 50 0 50
a2 0 50 50
Total 50 50 100

Table 2. A maximally dependent contingency table.

It follows that if the data were arranged as in Table 2, we could conclude that the value of A completely depended on the value of B, and therefore our ideal dependent probability would be 1. Note that if we took any instance from the sample where B = b1, then A = a1 (and so forth).

This type of table is employed in conventional χ² tests of homogeneity (independence). Indeed, contingency tests may be considered as an assessment combining an observed effect size and the weight of evidence (total number of cases) supporting this observation.

Similar measures are used in other circumstances. Wallis (2012) discusses goodness of fit measures of association which measure the degree to which one categorical distribution correlates with another. Similarly, Pearson’s r² and Spearman’s R² are standard effect sizes (measures of correlation) for variables expressed as ratio (real numbers) and ordinal (ranked) data respectively.

However, with categorical data a multiplicity of alternate measures are available. Measures have often developed independently and their differences are rarely explored.

Several candidates for the size of the correlation (or association) between discrete variables have been suggested in the literature. These include the contingency coefficient C, Yule’s Q and odds ratio o (Sheskin 1997). We can eliminate the odds ratio and Yule’s Q from consideration because they only apply to 2 × 2 tables. The odds ratio is not probabilistic (it is the proportion between two probabilities), although the logistic function can be applied to obtain the log odds. In this paper we consider three potential candidates: Cramér’s ϕ, adjusted C and Bayesian dependency.

Given this range of potential measures for effect size, two questions arise.

Do they measure the same thing? And if not, how may we choose between them?

Contents

  1. Introduction
  2. Probabilistic approaches to dependent probability
  3. A Bayesian approach to dependent probability
  4. Evaluating measures
  5. Robustness and confidence intervals on ϕ*
  6. A worked example

* The method cited in this archived article is erroneous. See my book (Wallis 2012), and ϕ intervals by inverted Gaussian S(ϕ) for the correct approach.

Citation (original post)

Wallis, S.A. 2012. Measures of association for contingency tables. London: Survey of English Usage, UCL. https://www.ucl.ac.uk/english-usage/statspapers/phimeasures.pdf

Citation (updated book version)

Wallis, S.A. 2021. The Size of an Effect. Chapter 14 in Wallis, S.A. Statistics in Corpus Linguistics Research. New York: Routledge. 221-232.

References

Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Sheskin, D.J. 1997. Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.

Wallis, S.A. 2019. Comparing χ2 tables for separability of distribution and effect. Journal of Quantitative Linguistics 26:4, 330-335. DOI: 10.1080/09296174.2018.1496537 » Post

Wallis, S.A. 2012. Goodness of fit measures for discrete categorical data. London: Survey of English Usage, UCL. » Post

See also

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.