Introduction
Cramér’s φ is an effect size measure used for evaluating correlations in contingency tables. In simple terms, a large φ score means that the two variables have a large effect on each other, and a small φ score means they have a small effect.
φ is closely related to χ², but it factors out the ‘weight of evidence’ and concentrates only on the slope. The simplest definition of φ is the unsigned formula
φ ≡ √χ² / N(k – 1),(1)
where k = min(r, c), the minimum of the number of rows and columns. In a 2 × 2 table, unsigned φ is simply φ = √χ² / N.
In Wallis (2012), I made a number of observations about φ.
- It is probabilistic, φ ∈ [0, 1].
- φ is the best estimate of the population interdependent probability, p(X ↔ Y). It measures the linear interpolation from flat to identity matrix.
- It is non-directional, so φ(X, Y) ≡ φ(Y, X).
Whereas in a larger table, there are multiple degrees of freedom and therefore many ways one might obtain the same φ score, 2 × 2 φ may usefully be signed, in which case φ ∈ [-1, 1]. A signed φ obtains a different score for an increase and a decrease in proportion.
φ ≡ (ad – bc) / √(a + b)(c + d)(a + c)(b + d),(2)
where a, b, c and d are cell scores in sequence, i.e. [[a b][c d]]:
x₁ | x₂ | |
y₁ | a | b |
y₂ | c | d |