In a recent paper focusing on distributions of simple NPs (Aarts and Wallis, 2014), we found an interesting correlation across text genres in a corpus between two independent variables. For the purposes of this study, a “simple NP” was an NP consisting of a single-word head. What we found was a strong correlation between
- the probability that an NP consists of a single-word head, p(single head), and
- the probability that single-word heads were a personal pronoun, p(personal pronoun | single head).
Note that these two variables are independent because they do not compete, unlike, say, the probability that a single-word NP consists of a noun, vs. the probability that it is a pronoun. The scattergraph below illustrates the distribution and correlation clearly.