Measures of association for contingency tables

Introduction Paper (PDF)

Often when we carry out research we wish to measure the degree to which one variable affects the value of another, setting aside the question as to whether this impact is sufficiently large as to be considered significant (i.e., significantly different from zero).

The most general term for this type of measure is size of effect. Effect sizes allow us to make descriptive statements about samples. Traditionally, experimentalists have referred to ‘large’, ‘medium’ and ‘small’ effects, which is rather imprecise. Nonetheless, it is possible to employ statistically sound methods for comparing different sizes of effect by estimating a Gaussian confidence interval (Bishop, Fienberg and Holland 1975) or by comparing pairs of contingency tables employing a “difference of differences” calculation (Wallis 2011).

In this paper we consider effect size measures for contingency tables of any size, generally referred to as “r × c tables”. This effect size is the “measure of association” or “measure of correlation” between the two variables. There are more measures applying to 2 × 2 tables than for larger tables. Continue reading

Advertisements

Goodness of fit measures for discrete categorical data

Introduction Paper (PDF)

A goodness of fit χ² test evaluates the degree to which an observed discrete distribution over one dimension differs from another. A typical application of this test is to consider whether a specialisation of a set, i.e. a subset, differs in its distribution from a starting point (Wallis 2013). Like the chi-square test for homogeneity (2 × 2 or generalised row r × column c test), the null hypothesis is that the observed distribution matches the expected distribution. The expected distribution is proportional to a given prior distribution we will term D, and the observed O distribution is typically a subset of D.

A measure of association, or correlation, between two distributions is a score which measures the degree of difference between the two distributions. Significance tests might compare this size of effect with a confidence interval to determine that the result was unlikely to occur by chance.

Common measures of the size of effect for two-celled goodness of fit χ² tests include simple difference (swing) and proportional difference (‘percentage swing’). Simple swing can be defined as the difference in proportions:

d = O₁/D₁ – O₀/D₀.

For 2 × 1 tests, simple swings can be compared to test for significant change between test results. Provided that O is a subset of D then these are real fractions and d is constrained d ∈ [-1, 1]. However, for r × 1 tests, where r > 2, we need to obtain an aggregate score to estimate the size of effect. Moreover, simple swing cannot be used meaningfully where O is not a subset of D.

In this paper we consider a wide range of different potential methods to address this problem.

Correlation scores are a sample statistic. The fact that one is numerically larger than the other does not mean that the result is significantly greater. To determine this we need to either

  1. estimate confidence intervals around each measure and employ a z test for two proportions from independent populations to compare these intervals, or
  2. perform an r × 1 separability test for two independent populations (Wallis 2011) to compare the distributions of differences of differences.

In cases where both tests have one degree of freedom, these procedures obtain the same result. With r > 2 however, there will be more than one way to obtain the same score. The distributions can have a significantly different pattern even when scores are identical.

We apply these methods to a practical research problem, how to decide if present perfect verb phrases more closely correlate with present- and past-marked verb phrases. We consider if present perfect VPs are more likely to be found in present-oriented texts or past-oriented ones.

Continue reading