Further evaluation of Binomial confidence intervals

Abstract Paper (PDF)

Wallis (2013) provides an account of an empirical evaluation of Binomial confidence intervals and contingency test formulae. The main take-home message of that article was that it is possible to evaluate statistical methods objectively and provide advice to researchers that is based on an objective computational assessment.

In this article we develop the evaluation of that article further by re-weighting estimates of error using Binomial and Fisher weighting, which is equivalent to an ‘exhaustive Monte-Carlo simulation’. We also develop an argument concerning key attributes of difference intervals: that we are not merely concerned with when differences are zero (conventionally equivalent to a significance test) but also accurate estimation when difference may be non-zero (necessary for plotting data and comparing differences).

1. Introduction

All statistical procedures may be evaluated in terms of the rate of two distinct types of error.

  • Type I errors (false positives): this is evidence of so-called ‘radical’ or ‘anti-conservative’ behaviour, i.e. rejecting null hypotheses which should not have been rejected, and
  • Type II errors (false negatives): this is evidence of ‘conservative’ behaviour, i.e. retaining or failing to reject null hypotheses unnecessarily.

It is customary to treat these errors separately because the consequences of rejecting and retaining a null hypothesis are qualitatively distinct. Continue reading “Further evaluation of Binomial confidence intervals”

ϕ intervals by inverted Gaussian S(ϕ)


Experimenting with deriving accurate 2 × 2 ϕ intervals, I also considered using Bishop et al.’s population standard deviation estimate.

To recap: Cramér’s ϕ (Cramér 1946) is a probabilistic intercorrelation for contingency tables based on the χ² statistic. An unsigned ϕ score is defined by

Cramér’s ϕ  = √χ²/N(k – 1)(1)

where χ² is the r × c test for homogeneity (independence), N is the total frequency in the table, and k the minimum number of values of variables X and Y, i.e. k = min(r, c). For 2 × 2 tables, k – 1 = 1, so ϕ = √χ²/N is often quoted.

An alternative formula for 2 × 2 tables obtains a signed result, where a negative sign implies that the table tends towards the opposite diagonal.

signed 2 × 2 ϕ ≡ (adbc) / √(a + b)(c + d)(a + c)(b + d),(2)

where a, b, c and d are cell frequencies. However, Equation (2) cannot be applied to larger tables.

The method I discuss here is potentially extensible to other effect sizes and other published estimates of standard deviations.

The best estimate of the population variance of ϕ for r × c tables is due to Bishop, Fienberg and Holland (1975: 386):

(ϕ) ≈ 1
N(k – 1)
i j
pi+² p+j²
– 3Σ
pi+ p+j
)² – 3Σ
pi+ p+j

i j
[ pi,j
pi+ p+j
pl+ p+j
pi+ p+m
 )]}, for ϕ ≠ 0, (3)

where pi,j = fi,j / N and pi+, p+j, etc. represent row and column (prior) probabilities. If ϕ = 0 we adjust the table by a small delta. To calculate the standard deviation, we simply take the square root of Equation (3). See also The variance of chi-square.

Continue reading “ϕ intervals by inverted Gaussian S(ϕ)”

Confidence intervals on pairwise ϕ statistics

View Post


Cramér’s ϕ is an effect size measure used for evaluating correlations in contingency tables. In simple terms, a large ϕ score means that the two variables have a large effect on each other, and a small ϕ score means they have a small effect.

ϕ is closely related to χ², but it factors out the ‘weight of evidence’ and concentrates only on the slope. The simplest definition of ϕ is the unsigned formula

ϕ ≡ √χ² / N(k – 1),(1)

where k = min(r, c), the minimum of the number of rows and columns. In a 2 × 2 table, unsigned ϕ is simply ϕ = √χ² / N.

In Wallis (2012), I made a number of observations about ϕ.

  • It is probabilistic, ϕ ∈ [0, 1].
  • ϕ is the best estimate of the population interdependent probability, p(XY). It measures the linear interpolation from flat to identity matrix.
  • It is non-directional, so ϕ(X, Y) ≡ ϕ(Y, X).

Whereas in a larger table, there are multiple degrees of freedom and therefore many ways one might obtain the same ϕ score, 2 × 2 ϕ may usefully be signed, in which case ϕ ∈ [-1, 1]. A signed ϕ obtains a different score for an increase and a decrease in proportion.

ϕ ≡ (adbc) / √(a + b)(c + d)(a + c)(b + d),(2)

where a, b, c and d are cell scores in sequence, i.e. [[a b][c d]]:

  x x
y a b
y c d

Continue reading “Confidence intervals on pairwise ϕ statistics”