The variance of chi-square

Recently, I have been reviewing some work I conducted developing confidence intervals for Cramér’s ϕ, building on Bishop, Fienberg and Holland (1975). Finalising the edit for my forthcoming book (Wallis, 2020), I realised that Yvonne Bishop and colleagues had provided a formula for the variance of χ² without saying so explicitly!

The authors show how this formula is a building block for other methods, including estimating the standard deviation of ϕ (labelled ‘V’ in their notation). They also make an unfortunate but common error in deriving confidence intervals, but that is another story.

Anyway, they give the formula for the variance of Φ² = χ²/N, but it is trivial to present it as the variance of χ².

(χ²) ≈ {4Σ
i j
pi+² p+j²
– 3Σ
pi+ p+j
)² – 3Σ
pi+ p+j

i j
[ pi,j
pi+ p+j
pl+ p+j
pi+ p+m
 )]}, for χ² ≠ 0, (1)

where pi,j = fi,j / N and pi+, p+j, etc. represent row and column (prior) probabilities in a χ² test for homogeneity (Bishop et al. 1975: 386).

I used this formula to derive a confidence interval for Cramér’s ϕ. Bishop et al. give the standard deviation for ϕ as (in my notation):

S(ϕ) = 1
2ϕ√k – 1

where Φ² = χ²/N. The rest is simple algebra once we recognise that the total number of cases in the table, N, is a scale factor for variance. The missing link is simply

S(Φ²) = √(χ²)/N. (3)

I generally cite the formula for the variance (ϕ) rather than the standard deviation to avoid a large square root symbol around Equation (1)! But as long as you remember that the standard deviation is the square root of the variance, you will be fine. 

The method of inverting S(ϕ) described may be of interest for anyone wishing to compute intervals for χ² or other effect size measures based on it.

In that post I explain a method for computing a confidence interval for χ² for a given error level α which first computes the confidence interval for Cramér’s ϕ and then translates it to a χ² scale.

See also


Bishop, Y.M.M., S.E. Fienberg & P.W. Holland (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Wallis, S.A. (2020). Statistics in Corpus Linguistics Research. New York: Routledge. » Announcement

Plotting the Clopper-Pearson distribution


In Plotting the Wilson distribution (Wallis 2018), I showed how it is possible to plot the distribution of the Wilson interval for all values of α. This exercise is revealing in a number of ways.

First, it shows the relationship between

  1. the Normal distribution of probable Binomial observations about the population, ideal or given proportion P, and
  2. the corresponding distribution of probable values of P about an observed Binomial proportion, p, (referred to as the Wilson distribution, as it is based on the Wilson score interval).

Over the last few years I have become convinced that approaching statistical understanding from the perspective of the tangible observation p is more instructive and straightforward to conceptualise than approaching it (as is traditional) from the imaginary ‘true value’ in the population, P. In particular, whenever you conduct an experiment you want to know how reliable your results are (or to put it an other way, what range of values you might reasonably expect were you to repeat your experiment) — not just if it is statistically significantly different from some arbitrary number, P!

Second, and as a result, just as it is possible to see the closeness of fit between the Binomial and the Normal distribution, through this exercise we can visualise the inverse relationship between Normal and Wilson distributions. We can see immediately that it is a fallacy to assume that the distribution of probable values about p is Normal, although numerous statistics books still quote ‘Wald’-type intervals and many methods operate on this assumption. (I am intermittently amused by plots of otherwise sophisticated modelling algorithms with impossibly symmetric intervals in probability space.)

Third, I showed in the paper that ‘the Wilson distribution’ is properly understood as two distributions: the distribution of probable values of P below and above p. If we employ a continuity-correction, the two distributions become clearly distinct.

This issue sometimes throws people. Compare:

  1. the most probable location of P,
  2. the most probable location of P if we know that P < p (lower interval),
  3. the most probable location of P if we know that P > p (upper interval).

Wilson distributions correspond to (2) and (3) above, obtained by finding the roots of the Normal approximation. See Wallis (2013). The sum, or mean, of these is not (1), as becomes clearer when we plot other related distributions.

There are a number of other interesting and important conclusions from this work, including that the logit Wilson interval is in fact almost Normal, except for p = 0 or 1.

In this post I want to briefly comment on some recent computational work I conducted in preparation for my forthcoming book (Wallis, in press). This involves plotting the Clopper-Pearson distribution. Continue reading “Plotting the Clopper-Pearson distribution”

Further evaluation of Binomial confidence intervals

Abstract Paper (PDF)

Wallis (2013) provides an account of an empirical evaluation of Binomial confidence intervals and contingency test formulae. The main take-home message of that article was that it is possible to evaluate statistical methods objectively and provide advice to researchers that is based on an objective computational assessment.

In this article we develop the evaluation of that article further by re-weighting estimates of error using Binomial and Fisher weighting, which is equivalent to an ‘exhaustive Monte-Carlo simulation’. We also develop an argument concerning key attributes of difference intervals: that we are not merely concerned with when differences are zero (conventionally equivalent to a significance test) but also accurate estimation when difference may be non-zero (necessary for plotting data and comparing differences).

1. Introduction

All statistical procedures may be evaluated in terms of the rate of two distinct types of error.

  • Type I errors (false positives): this is evidence of so-called ‘radical’ or ‘anti-conservative’ behaviour, i.e. rejecting null hypotheses which should not have been rejected, and
  • Type II errors (false negatives): this is evidence of ‘conservative’ behaviour, i.e. retaining or failing to reject null hypotheses unnecessarily.

It is customary to treat these errors separately because the consequences of rejecting and retaining a null hypothesis are qualitatively distinct. Continue reading “Further evaluation of Binomial confidence intervals”