A goodness of fit χ² test evaluates the degree to which an observed discrete distribution over one dimension differs from another. A typical application of this test is to consider whether a specialisation of a set, i.e. a subset, differs in its distribution from a starting point (Wallis 2013). Like the chi-square test for homogeneity (2 × 2 or generalised row r × column c test), the null hypothesis is that the observed distribution matches the expected distribution. The expected distribution is proportional to a given prior distribution we will term D, and the observed O distribution is typically a subset of D.
A measure of association, or correlation, between two distributions is a score which measures the degree of difference between the two distributions. Significance tests might compare this size of effect with a confidence interval to determine that the result was unlikely to occur by chance.
Common measures of the size of effect for two-celled goodness of fit χ² tests include simple difference (swing) and proportional difference (‘percentage swing’). Simple swing can be defined as the difference in proportions:
d = O₁/D₁ – O₀/D₀.
For 2 × 1 tests, simple swings can be compared to test for significant change between test results. Provided that O is a subset of D then these are real fractions and d is constrained d ∈ [-1, 1]. However, for r × 1 tests, where r > 2, we need to obtain an aggregate score to estimate the size of effect. Moreover, simple swing cannot be used meaningfully where O is not a subset of D.
In this paper we consider a wide range of different potential methods to address this problem.
Correlation scores are a sample statistic. The fact that one is numerically larger than the other does not mean that the result is significantly greater. To determine this we need to either
- estimate confidence intervals around each measure and employ a z test for two proportions from independent populations to compare these intervals, or
- perform an r × 1 separability test for two independent populations (Wallis 2011) to compare the distributions of differences of differences.
In cases where both tests have one degree of freedom, these procedures obtain the same result. With r > 2 however, there will be more than one way to obtain the same score. The distributions can have a significantly different pattern even when scores are identical.
We apply these methods to a practical research problem, how to decide if present perfect verb phrases more closely correlate with present- and past-marked verb phrases. We consider if present perfect VPs are more likely to be found in present-oriented texts or past-oriented ones.
Continue reading “Goodness of fit measures for discrete categorical data”