Are embedding decisions independent?

Evidence from preposition(al) phrases

Abstract Full Paper (PDF)

One of the more difficult challenges in linguistics research concerns detecting how constraints might apply to the process of constructing phrases and clauses in natural language production. In previous work (Wallis 2019) we considered a number of operations modifying noun phrases, including sequential and embedded modification with postmodifying clauses. Notably, we found a pattern of a declining additive probability for each decision to embed postmodifying clauses, albeit a pattern that differed in speech and writing.

In this paper we use the same research paradigm to investigate the embedding of an altogether simpler structure: postmodifying nouns with prepositional phrases. These are approximately twice as frequent and structures exhibit as many as five levels of embedding in ICE-GB (two more than are found for clauses). Finally the embedding model is simplified because only one noun phrase can be found within each prepositional phrase. We discover different initial rates and patterns for common and proper nouns, and certain subsets of pronouns and numerals. Common nouns (80% of nouns in the corpus) do appear to generate a secular decline in the additive probability of embedded prepositional phrases, whereas the equivalent rate for proper nouns rises from a low initial probability, a fact that appears to be strongly affected by the presence of titles.

It may be generally assumed that like clauses, prepositional phrases are essentially independent units. However, we find evidence from a number of sources that indicate that some double-layered constructions may be being added as single units. In addition to titles, these constructions include schematic or idiomatic expressions whose head is an ‘indefinite’ pronoun or numeral. Continue reading “Are embedding decisions independent?”

Confidence intervals on goodness of fit ϕ scores

Introduction

In Wallis (2021), I offered two approaches to computing confidence intervals on the effect size Cramér’s ϕ. I also motivated and summarised approaches to a comparable goodness of fit metric (where a high ϕ score reflects a greater difference and thus a ‘poor fit’).

A goodness of fit evaluation is one where we compare an observed distribution of k cells, say, with an expected distribution of the same number of cells. The test, which is a type of χ2 test, has a number of applications. A goodness of fit ϕ score would be expected to range from 0 to 1, with 0 representing identity and 1 representing the opposite, a maximally distinct distribution. 

In an earlier paper published on this blog (Wallis 2012), I considered a range of possible measures that had this property. However, one of the questions I had left unresolved was how to compute a confidence interval on such a measure.

Why might we want to do this?

  • To cite or plot measures with confidence intervals, identifying the level of certainty we can ascribe to a particular observed measure.
  • To compare ϕ with an arbitrary level, e.g. to test if ϕ ≠ D where D ≠ 0. (As we shall see, where k > 2 and ϕ unsigned, comparing goodness of fit ϕ with 0 is more difficult due to loss of information, and you should employ a goodness of fit test instead.)
  • To compare two ϕ scores for their significant difference in a given direction, e.g. to establish that, say, ϕ1 > ϕ2.

Summing independent, dependent and constrained variances

The Bienaymé theorem serves for computing the total variance of the sum of k independent Normally distributed variables by simple summation of variance.

Bienaymé variance s2 = s12 + s22 + … + sk2 = ∑si2.(1)

A total standard deviation s is obtained by taking the square root of Equation (1).

To estimate a confidence interval on a sum of k independent proportions, ∑pi, we follow Zou and Donner (2008). A confidence interval on a sum of proportions may be obtained by substituting interval widths, u = (pw) and u+ = (w+p), for each si term in the equation. The confidence interval is then found with the square root of the result. The constant zα/2 factors out. See An algebra of intervals.

independent sum ∈ (L, U) = (∑pi – √∑(piwi, ∑pi + √∑(wi+pi), (1′)

This assumes that all of these proportions are independent. But what of chi-square-type scenarios, where there are k – 1 degrees of freedom for k proportions summing to 1?

Obviously, we are not interested in the confidence interval for ∑pi, as this must be 1 (or [1, 1] if you prefer). But we are interested in confidence intervals for the sum of functions of pi, ∑fn(pi). Zou and Donner argue that equations of this type should obtain a sound interval provided that the original intervals are sound.

Consider the simplest two-valued 2 × 1 goodness of fit χ2. As we know, the two proportions are completely dependent. If p1 increases, p2 = 1 – p1 must fall. The table has a single degree of freedom. Consequently, standard deviations and interval positions are simply summed.

total standard deviation s = s1 + s2. (2)

dependent sum (L, U) = (∑fn(wi), ∑fn(wi+)), (2′)

for an increasing monotonic function, fn, over P = [0, 1]. We will discuss other function types below.

Another way of thinking about this is that independent variables are considered to vary at right angles (tangents) to each other, whereas strictly dependent variables vary along the same axis. In some circumstances this means variables subtract and even cancel each other out; in others (like χ2) they sum.

Figure 1. Left: standard deviation of sum of independent variables x, y, z; right, summing standard deviations of two dependent variables on the same axis.
Figure 1. Left: standard deviation of sum of independent variables x, y, z; right, summing standard deviations of two dependent variables on the same axis.

How do we generalise this idea to closed k × 1 goodness of fit χ2 tables, where there are k – 1 degrees of freedom? Now there are fewer dimensions than variables. Continue reading “Confidence intervals on goodness of fit ϕ scores”

The variance of chi-square

Recently, I have been reviewing some work I conducted developing confidence intervals for Cramér’s φ, building on Bishop, Fienberg and Holland (1975). Finalising the edit for my forthcoming book (Wallis, 2021), I realised that Yvonne Bishop and colleagues had provided a formula for the variance of χ² without saying so explicitly!

The authors show how this formula is a building block for other methods, including estimating the standard deviation of ϕ (labelled ‘V’ in their notation). They also make an unfortunate but common error in deriving confidence intervals, but that is another story.

Anyway, they give the formula for the variance of Φ² = χ²/N, but it is trivial to present it as the variance of χ².

S²(ϕ) ≈ 1
N(k – 1)
 {4Σ
i j
pi,j³
pi+² p+j²
– 3Σ
i
1
pi+
(Σ
j
pi,j²
pi+ p+j
)²
– 3Σ
j
1
p+j
(Σ
i
pi,j²
pi+ p+j
)²
+2Σ
i j
[ pi,j
pi+ p+j
(Σ
l
pl,j²
pl+ p+j
 )(Σ
m
pi,m²
pi+ p+m
 )]},
for ϕ ≠ 0, (1)

where pi,j = fi,j / N and pi+, p+j, etc. represent row and column (prior) probabilities in a χ² test for homogeneity (Bishop et al. 1975: 386).

I used this formula to derive a confidence interval for Cramér’s φ. Bishop et al. give the standard deviation for ϕ as (in my notation):

S(ϕ) = 1
2ϕ√k – 1
 S(Φ²),(2)

where Φ² = χ²/N. The rest is simple algebra once we recognise that the total number of cases in the table, N, is a scale factor for variance. The missing link is simply

S(Φ²) = √S²(χ²)/N. (3)

I generally cite the formula for the variance S²(ϕ) rather than the standard deviation to avoid a large square root symbol around Equation (1)! But as long as you remember that the standard deviation is the square root of the variance, you will be fine.

The method of inverting S(ϕ) described may be of interest for anyone wishing to compute intervals for χ² or other effect size measures based on it.

In that post I explain a method for computing a confidence interval for χ² for a given error level α which first computes the confidence interval for Cramér’s ϕ and then translates it to a χ² scale.

References

Bishop, Y.M.M., S.E. Fienberg & P.W. Holland (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Wallis, S.A. (2021). Statistics in Corpus Linguistics Research. New York: Routledge. » Announcement

See also