Change and certainty: plotting confidence intervals (2)

Introduction

In a previous post I discussed how to plot confidence intervals on observed probabilities. Using this method we can create graphs like the following. (Data is in the Excel spreadsheet we used previously: for this post I have added a second worksheet.)

The graph depicts both the observed probability of a particular form and the certainty that this observation is accurate. The ‘I’-shaped error bars depict the estimated range of the true value of the observation at a 95% confidence level (see Wallis 2013 for more details).

A note of caution: these probabilities are semasiological proportions (different uses of the same word) rather than onomasiological choices (see Choice vs. use).

An example graph plot showing the changing proportions of meanings of the verb think over time in the US TIME Magazine Corpus, with Wilson score intervals, after Levin (2013). Many thanks to Magnus for the data!

In this post I discuss ways in which we can plot intervals on changes (differences) rather than single probabilities.

The clearer our visualisations, the better we can understand our own data, focus our explanations on significant results and communicate our results to others. Continue reading

Advertisements

Goodness of fit measures for discrete categorical data

Introduction Paper (PDF)

A goodness of fit χ² test evaluates the degree to which an observed discrete distribution over one dimension differs from another. A typical application of this test is to consider whether a specialisation of a set, i.e. a subset, differs in its distribution from a starting point (Wallis 2013). Like the chi-square test for homogeneity (2 × 2 or generalised row r × column c test), the null hypothesis is that the observed distribution matches the expected distribution. The expected distribution is proportional to a given prior distribution we will term D, and the observed O distribution is typically a subset of D.

A measure of association, or correlation, between two distributions is a score which measures the degree of difference between the two distributions. Significance tests might compare this size of effect with a confidence interval to determine that the result was unlikely to occur by chance.

Common measures of the size of effect for two-celled goodness of fit χ² tests include simple difference (swing) and proportional difference (‘percentage swing’). Simple swing can be defined as the difference in proportions:

d = O₁/D₁ – O₀/D₀.

For 2 × 1 tests, simple swings can be compared to test for significant change between test results. Provided that O is a subset of D then these are real fractions and d is constrained d ∈ [-1, 1]. However, for r × 1 tests, where r > 2, we need to obtain an aggregate score to estimate the size of effect. Moreover, simple swing cannot be used meaningfully where O is not a subset of D.

In this paper we consider a wide range of different potential methods to address this problem.

Correlation scores are a sample statistic. The fact that one is numerically larger than the other does not mean that the result is significantly greater. To determine this we need to either

  1. estimate confidence intervals around each measure and employ a z test for two proportions from independent populations to compare these intervals, or
  2. perform an r × 1 separability test for two independent populations (Wallis 2011) to compare the distributions of differences of differences.

In cases where both tests have one degree of freedom, these procedures obtain the same result. With r > 2 however, there will be more than one way to obtain the same score. The distributions can have a significantly different pattern even when scores are identical.

We apply these methods to a practical research problem, how to decide if present perfect verb phrases more closely correlate with present- and past-marked verb phrases. We consider if present perfect VPs are more likely to be found in present-oriented texts or past-oriented ones.

Continue reading