The variance of chi-square

Recently, I have been reviewing some work I conducted developing confidence intervals for Cramér’s φ, building on Bishop, Fienberg and Holland (1975). Finalising the edit for my forthcoming book (Wallis, 2021), I realised that Yvonne Bishop and colleagues had provided a formula for the variance of χ² without saying so explicitly!

The authors show how this formula is a building block for other methods, including estimating the standard deviation of ϕ (labelled ‘V’ in their notation). They also make an unfortunate but common error in deriving confidence intervals, but that is another story.

Anyway, they give the formula for the variance of Φ² = χ²/N, but it is trivial to present it as the variance of χ².

S²(ϕ) ≈ 1
N(k – 1)
 {4Σ
i j
pi,j³
pi+² p+j²
– 3Σ
i
1
pi+
(Σ
j
pi,j²
pi+ p+j
)²
– 3Σ
j
1
p+j
(Σ
i
pi,j²
pi+ p+j
)²
+2Σ
i j
[ pi,j
pi+ p+j
(Σ
l
pl,j²
pl+ p+j
 )(Σ
m
pi,m²
pi+ p+m
 )]},
for ϕ ≠ 0, (1)

where pi,j = fi,j / N and pi+, p+j, etc. represent row and column (prior) probabilities in a χ² test for homogeneity (Bishop et al. 1975: 386).

I used this formula to derive a confidence interval for Cramér’s φ. Bishop et al. give the standard deviation for ϕ as (in my notation):

S(ϕ) = 1
2ϕ√k – 1
 S(Φ²),(2)

where Φ² = χ²/N. The rest is simple algebra once we recognise that the total number of cases in the table, N, is a scale factor for variance. The missing link is simply

S(Φ²) = √S²(χ²)/N. (3)

I generally cite the formula for the variance S²(ϕ) rather than the standard deviation to avoid a large square root symbol around Equation (1)! But as long as you remember that the standard deviation is the square root of the variance, you will be fine.

The method of inverting S(ϕ) described may be of interest for anyone wishing to compute intervals for χ² or other effect size measures based on it.

In that post I explain a method for computing a confidence interval for χ² for a given error level α which first computes the confidence interval for Cramér’s ϕ and then translates it to a χ² scale.

References

Bishop, Y.M.M., S.E. Fienberg & P.W. Holland (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Wallis, S.A. (2021). Statistics in Corpus Linguistics Research. New York: Routledge. » Announcement

See also

The variance of Binomial distributions

Introduction

Recently I’ve been working on a problem that besets researchers in corpus linguistics who work with samples which are not drawn randomly from the population but rather are taken from a series of sub-samples. These sub-samples (in our case, texts) may be randomly drawn, but we cannot say the same for any two cases drawn from the same sub-sample. It stands to reason that two cases taken from the same sub-sample are more likely to share a characteristic under study than two cases drawn entirely at random. I introduce the paper elsewhere on my blog.

In this post I want to focus on an interesting and non-trivial result I needed to address along the way. This concerns the concept of variance as it applies to a Binomial distribution.

Most students are familiar with the concept of variance as it applies to a Gaussian (Normal) distribution. A Normal distribution is a continuous symmetric ‘bell-curve’ distribution defined by two variables, the mean and the standard deviation (the square root of the variance). The mean specifies the position of the centre of the distribution and the standard deviation specifies the width of the distribution.

Common statistical methods on Binomial variables, from χ² tests to line fitting, employ a further step. They approximate the Binomial distribution to the Normal distribution. They say, although we know this variable is Binomially distributed, let us assume the distribution is approximately Normal. The variance of the Binomial distribution becomes the variance of the equivalent Normal distribution.

In this methodological tradition, the variance of the Binomial distribution loses its meaning with respect to the Binomial distribution itself. It seems to be only valuable insofar as it allows us to parameterise the equivalent Normal distribution.

What I want to argue is that in fact, the concept of the variance of a Binomial distribution is important in its own right, and we need to understand it with respect to the Binomial distribution, not the Normal distribution. Sometimes it is not necessary to approximate the Binomial to the Normal, and if we can avoid this approximation our results are likely to be stronger as a result.

Continue reading “The variance of Binomial distributions”

Adapting variance for random-text sampling

Introduction Paper (PDF)

Conventional stochastic methods based on the Binomial distribution rely on a standard model of random sampling whereby freely-varying instances of a phenomenon under study can be said to be drawn randomly and independently from an infinite population of instances.

These methods include confidence intervals and contingency tests (including multinomial tests), whether computed by Fisher’s exact method or variants of log-likelihood, χ², or the Wilson score interval (Wallis 2013). These methods are also at the core of others. The Normal approximation to the Binomial allows us to compute a notion of the variance of the distribution, and is to be found in line fitting and other generalisations.

In many empirical disciplines, samples are rarely drawn “randomly” from the population in a literal sense. Medical research tends to sample available volunteers rather than names compulsorily called up from electoral or medical records. However, provided that researchers are aware that their random sample is limited by the sampling method, and draw conclusions accordingly, such limitations are generally considered acceptable. Obtaining consent is occasionally a problematic experimental bias; actually recruiting relevant individuals is a more common problem.

However, in a number of disciplines, including corpus linguistics, samples are not drawn randomly from a population of independent instances, but instead consist of randomly-obtained contiguous subsamples. In corpus linguistics, these subsamples are drawn from coherent passages or transcribed recordings, generically termed ‘texts’. In this sampling regime, whereas any pair of instances in independent subsamples satisfy the independent-sampling requirement, pairs of instances in the same subsample are likely to be co-dependent to some degree.

To take a corpus linguistics example, a pair of grammatical clauses in the same text passage are more likely to share characteristics than a pair of clauses in two entirely independent passages. Similarly, epidemiological research often involves “cluster-based sampling”, whereby each subsample cluster is drawn from a particular location, family nexus, etc. Again, it is more likely that neighbours or family members share a characteristic under study than random individuals.

If the random-sampling assumption is undermined, a number of questions arise.

  • Are statistical methods employing this random-sample assumption simply invalid on data of this type, or do they gracefully degrade?
  • Do we have to employ very different tests, as some researchers have suggested, or can existing tests be modified in some way?
  • Can we measure the degree to which instances drawn from the same subsample are interdependent? This would help us determine both the scale of the problem and arrive at a potential solution to take this interdependence into account.
  • Would revised methods only affect the degree of certainty of an observed score (variance, confidence intervals, etc.), or might they also affect the best estimate of the observation itself (proportions or probability scores)?

Continue reading “Adapting variance for random-text sampling”