Directional evidence revisited

End weight bias and templating in conjoined phrase postmodification

Abstract Full Paper (PDF)

The tendency of speakers and writers to place larger constructions at the end of sentences, whether consciously or unconsciously, is well established. Often this question of ‘end weight’ is usually discussed in relation to grammatical transformations. In this short paper we demonstrate a simple method for investigating a similar phenomenon in coordination patterns where conjoins are either noun phrases, e.g. the X of Y or Z, or prepositional phrases, e.g. the X of Y or of Z. We then investigate whether the coordinated noun phrases (Y, Z) are themselves postmodified, either by another prepositional phrase or by a clause. As postmodifying phrases and clauses are potentially expansive, they are grammatically complex and we operationalise them as signifiers of ‘weight’. We find that both sets of coordination patterns are end-sequence biased by weight.

We also find an elevated frequency for patterns where both first and last conjoins in the sequence are greater than would be expected were they independently selected. Setting aside potential explanations of directional influence, which cannot be decided inductively, we focus instead on the content of these doubly-postmodified constructions and examine them for evidence of templating.

We show that results do not significantly differ if limited to semantically unordered coordination patterns, and contrast the two types of postmodification with premodifying adjective phrases, where scope ambiguity may also be a factor.

1. Introduction

Are phrases at the end of a coordination sequence of conjoined phrases larger, more complex or ‘heavier’ than those at the start?

The principle of ‘end weight’ is often discussed in the context of empirical evidence of information structuring (see e.g. Kaltenböck 2020): moreover, students of English are taught to position larger constructions at the end of utterances (Cowan 2008). Similarly, studies of the dative alternation with the double object construction – Aden gave the prize to Beth (dative) vs. Aden gave Beth the prize (double object) – have observed that the size of the movable object (the prize) appears to be a factor in its position (Bresnan, Cueni, Nikitina and Baayen 2005).

However a freer structure for study – one that requires no additional transformative device such as extraposition or double-object constructions – is the coordination of like phrases.

If there is a general cognitive or communicative principle engaged in extraposition and other broadly semantically neutral transformations such as the dative alternation, it seems likely that coordination is also final end-weighted, i.e. the hypothesis is that the final conjoin would tend to be ‘heavier’ than earlier ones. Cognitively, such a method would minimise interruptions to the producer’s attention, and allow them to concentrate on the coordinated phrase sequence itself. Communicatively, end-weight strategies package information to the recipient without large potentially distracting diversions, a principle also termed ‘end focus’. Whereas explicit teaching tends to prioritise conscious communicative purposes, as linguists we are usually more interested in evidence of spontaneous biases.

Since planning is more difficult to employ in spontaneous speech than edited writing, observing differences between speech and writing may help us distinguish explanations.

An important method that adds ‘weight’ to phrases is noun phrase postmodification, typically by clauses and preposition(al) phrases (PPs). This is not the only method for adding weight: alternatives include introduction of premodifying adjective and determinative phrases, adjuncts, ‘floating’ postmodifiers, or the use of compound nouns. However, since a clause or PP may itself be expanded, their introduction opens the door to potentially unlimited constructions.

In a sequence of like conjoins, the same structures could be added to any conjoin, but on the principle of end weight, we hypothesize they tend to be found at the end of a sequence rather than at the start.

Such a pattern could arise in at least two ways. A speaker may plan ahead to place weightier conjoins at the end of a sequence. Alternatively, it is also possible that, having introduced a particularly lengthy construction, a speaker might then decide to stop the coordination sequence.

One potential reason for postmodification end-weighting in conjoins concerns ambiguity of scope. Adjective premodification of nouns is well known to exhibit this phenomenon, c.f. the old men and women.

Let us consider a simple example. Example (1) consists of a noun phrase with conjoined postmodifying (NPPO) prepositional phrases (PPs) identified by brackets:

(1)a systematic adoption [of the ideals [of Bildung]] and [of the German middle class way [of life]] [S2B-042 #47]

It would be entirely possible to rewrite this noun phrase as Example (1′).

(1′)…a systematic adoption [of the German middle class way [of life]] and [of the ideals [of Bildung]]

However (1′) is slightly ambiguous. Are the ‘ideals’ systematically adopted, or are they part of ‘the German middle class way’? Arguably, the original example (1) is ambiguous for the same reason! In speech, intonation may help. In summary, the positioning of constructions can aid in resolving ambiguity, provided that the speaker plans ahead.

However a more substantive issue concerns ordering. Some coordination patterns are semantically sequenced by the conjunctions. Consider (2) and (3) below.

(2)…having a degree in say English Literature or <,> uh Greek and Latin whatever …only says something about your ability [in that area] and not [in the wider areas [of life]]…[S1B-029 #153]

(3)…the consequences of these proposals for the movement of traffic [outside the areas immediately affected], and particularly [in the direction [of the A3]].

Example (2) is exclusionary, (3) is specificatory. Reversing the conjoins is quite difficult.

(2′)…having a degree in say English Literature or <,> uh Greek and Latin whatever …only says something about your ability not [in the wider areas [of life]] but [in that area]…

(3′)…the consequences of these proposals for the movement of traffic particularly [in the direction[of the A3]], and also [outside the areas immediately affected].

Rewritten examples seem quite strained, especially the specificatory ones. It seems more straightforward in English to start with a broader concept and then narrow it, than to present a narrow concept and widen it.

This might affect a result otherwise attributed to ‘end weight’. In these ordered examples there may be logical-semantic reasons why the second conjoin, because it represents a subset of the first (whether excluded or specified), might tend to be more complex and grammatically ‘heavier’.

This type of reasoning does not apply to (4), which is ordered logically. There is no particular reason why the consequent (the second conjoin) is ‘heavier’ than the antecedent (the first).

(4)In the fixed dunes, [with their much higher organic content,] and therefore [with a greater proportion [of fine particles]]… [W2A-022 #75]

For the purposes of the present study we will first pool ordered and unordered examples alike. In Section 3.3 we review our data by repeating our experiments, requiring and or or to immediately precede the last conjoin, and thereby obtain a dataset of unordered cases.

2. Experiments

2.1 Conjoined prepositional phrases containing noun phrases postmodified by PPs

We obtain data from the fully-parsed British Component of the International Corpus of English (ICE-GB, Nelson, Wallis and Aarts 2002).

All of the experiments obtain data by the following approach. We construct four Fuzzy Tree Fragments (FTFs) according to a single schema, and extract data using ICECUP. The yellow nodes are optional, so we have four versions of this FTF (neither, initial, final, both).

In our first experiment we will use the schema in Figure 1. We relax the constraint that the PP must immediately follow the noun phrase head (indicated by a white ‘After’ arrow, rather than a black ‘Immediately after’ arrow). Should any other element fall between the head and PP, the FTF will still find it. However, this relaxation has a drawback. The FTF matches cases with multiple postmodifiers more than once, creating duplicate matches, so we should review all our results and subtract any duplicates manually.

Figure 1. FTF schema: optional NP postmodification in conjoined prepositional phrases (ordered or unordered sequences). Four FTFs are constructed, with the right-most NPPO, PP nodes present or removed.
Figure 1. FTF schema: optional NP postmodification in conjoined prepositional phrases (ordered or unordered sequences). Four FTFs are constructed, with the right-most NPPO, PP nodes present or removed.1

Using this schema, for all ICE-GB data, we obtain values for the highlighted cells and construct a contingency table by subtraction (Table 1). The FTF with both ‘NPPO, PP’ nodes yields 13 cases, the FTF with a postmodifying PP node in the first position matches all of these plus another 8.

Note that this search method pools data from coordination sequences of any number and does not pay attention to intervening conjoins. A pattern that only postmodifies a medial conjoin would therefore register as ‘neither’. However longer conjoin sequences are relatively low in frequency.

We extract the following proportions with 95% Wilson score intervals (see Table 1):

p(first) = 21/186 = 0.1129 ∈ (0.0750, 0.1664),
p(last) = 47/186 = 0.2527 ∈ (0.1957, 0.3197).

CJ, PP +PP – last + last total p(last)
– first 135 34 165
+ first 8 13 21 0.6190
total 139 47 186 0.2527
p(first) 0.2766 0.1129

Table 1. Contingency table for independent decisions to have a postmodifying PP in first or last place for conjoined PPs (‘+ first’ means the first conjoin is postmodified by a PP), all ICE-GB data. χ2 = 16.83 (Yates’s χ2 = 14.71).

If we compute confidence intervals on p(first) and p(last), we find that the intervals do not overlap, we can say that p(last) is significantly greater than p(first), i.e. it is more likely that a later conjoin is postmodified than an earlier one. In other words, we find a potential end-weight bias.

2.2 Interaction and patterning

We could stop at this point. However comparing p(first) and p(last) evaluates their independent rates. It does not address their interaction.

Note that the probability of choosing the cell (+first, +last) in Table 1, which we might write as p(both) = 13/186 = 0.0699. This is nearly two and a half times the independent intersection probability, p(first) × p(last) = 0.0285. The ratio has the scaled 95% Wilson score interval for p/P, where P is simply a constant.

p/P = 0.0699/0.0285 = 2.45 ∈ (1.45, 4.06),

where p is the observed proportion, p(both), and P = p(first) × p(last). There are between 1.5 and 4 times (with a best estimate of 2.45) more ‘double postmodification’ cases than would be expected were the two postmodification acts independent.

We can compute Cramér’s 2 × 2 φ = 0.3008 ∈ (0.1385, 0.4646).2 This tells us that there is a sizable effect size, which is 95% sure to be within this range.

This effect size can be used to compare the degree of association between decisions. However, since φ is associative, it is bidirectional, and does not distinguish between axes (directions).

Using these proportions, we could examine how the rate of postmodification on one conjoin changes if we know the other is postmodified. But as we shall discuss in Section 3.1, making a claim of directionality of influence is doubly misguided.

In the meantime, consider Figure 2, which plots the changing rate of each decision point as separate trends.3 We compute these second, conditional proportions like this:

p(first | last) = 17/47 = 0.2766 ∈ (0.1694, 0.4176),
p(last | first) = 17/21 = 0.6190 ∈ (0.4080, 0.7925).

We plot spoken and written rates, alongside the pooled ‘all ICE-GB’ rate, both in order to identify whether mode of delivery makes a difference to the outcome, and as a kind of weak replication check (see Wallis 2021: 201). Note that although we might perceive differences between speech and writing in Figure 2, they are not significantly different (note how the intervals overlap points).

Figure 2. Changing rate of postmodifying a noun phrase head with a PP in the last position of a series of conjoined PPs, p(last), vs. the changing rate of p(first), if the other conjoin is postmodified.
Figure 2. Changing rate of postmodifying a noun phrase head with a PP in the last position of a series of conjoined PPs, p(last), vs. the changing rate of p(first), if the other conjoin is postmodified.

Figure 2. Changing rate of postmodifying a noun phrase head with a PP in the last position of a series of conjoined PPs, p(last), vs. the changing rate of p(first), if the other conjoin is postmodified.

The graph draws attention to Church’s gradients, i.e. p(last | first) – p(last), etc. This gradient represents the tendency for the rate of postmodification at a particular conjoin to increase if we know that the first is postmodified. Examining the difference between conditional and absolute probabilities is an idea due to Ken Church (2000). We might also compare this gradient with the equivalent gradient for the opposite direction, i.e. p(first | last) – p(first). If there was an influence in a particular direction, one could expect a steeper gradient on the influenced term.

However, such an interpretation is incorrect. We should be careful in not over-interpreting the increased gradient for p(last) over p(first). The two gradients are not independent observations, but difference measures extracted from a contingency table with a single degree of freedom.

We already know that p(last) > p(first) (‘absolute’ values, left). And we know that there are additional cases of double-postmodification. The steeper gradient is entirely due to these two facts. In other words, it is a mathematical artifact of Table 1!4

Indeed, in each set of data we studied in this paper, p(last | first) exceeds 0.5 numerically, or, to put it another way, more than half the cases that are postmodified in the first position have a postmodified final conjoin.

However this does not permit us to assume a directional influence, which we might codify in the form ‘+postmodify(first) → +postmodify(last)’ (choosing to postmodify the first conjoin encourages, or primes, postmodification of the last). We will return to questions of directional influence and templating in Sections 3.1 and 3.2.

With the above in mind, a simpler way to present this data is shown in Figure 3. This representation places the emphasis on particular patterns (‘initial’ = ‘first only’; ‘final’ = ‘last only’) rather than on the probability of an item being found. Thus p(first) is the probability that x exists in the initial position, which could be in either ‘initial’ or ‘both’ patterns.

For ICE-GB and spoken data, the intervals for p(initial) and p(final) do not overlap. In fact, all three are significant at α = 0.05, confirmed by a single-sample z test for the pair (Wallis 2021: 166).

Finally, a meaningful statistic is the end weight risk ratio, for which we can also estimate 95% confidence intervals (Wallis 2022b). This is simply the ratio between the probabilities, p(final)/p(initial). For all ICE-GB data, we observe 4.25 times as many conjoin final cases as initial ones, with a 95% interval of 2.06 to 8.84 times.

Figure 3. Probability distribution for the position of ‘heavy’ (postmodified) conjoined prepositional phrases. The final column identifies ‘excess’ double-postmodified patterns.
Figure 3. Probability distribution for the position of ‘heavy’ (postmodified) conjoined prepositional phrases. The final column identifies ‘excess’ double-postmodified patterns.

Figure 3. Probability distribution for the position of ‘heavy’ (postmodified) conjoined prepositional phrases. The final column identifies ‘excess’ double-postmodified patterns.

. . .


3. Discussion

3.1 The (directional) causality trap

Haunting this article is the spectre of directional explanations. Observing a high number of conjoin-final ‘heavy’ phrases, we are tempted to infer directionality of decision making and thus influence. If we plot graphs like Figure 2, this temptation becomes even greater. The gradient for p(last) tends to be steeper than that for the opposite inference. In the case of postmodified noun phrases we can obtain a significantly steeper result.

But what does this mean? We could claim that where a gradient in one direction is found to be significantly greater than another this gradient is likely to be seen in future data. This means that we might say the prediction is reproducible, but it does not mean that the reason this pattern is observed is due to a particular underlying process.

But as we have seen, this result can also be explained as a mathematical artifact of two other facts: that conjoins are end-weighted (p(last) > p(first), and p(final) > p(initial)) and that the intersection (the doubly-postmodified ‘both’ pattern) is greater than expected.

In fact, any observed pattern like this is the aggregate result of multiple patterns and tendencies, idiomatic expressions and schema, as well as genuinely independent decisions which influence one another.

As with any correlation, great care should be taken not to interpret a directional correlation as evidence of causality. We cannot know for certain that a decision regarding adding a postmodifier to the first conjoin is made prior to a decision to add one to the last, however intuitive or seductive this reasoning might be. Human mental processing is highly parallelised, and conjoins might be constructed internally in parallel, and only articulated in a single order.

Finally, although we see an elevated rate for cases where both first and last conjoin is postmodified, this might be due to a specific set of cases, such as idiomatic patterns or templating.

There are some circumstances in corpus linguistics where direction might be deduced, for example where one speaker primes another. But greater care must be applied when dealing with interaction research within an utterance. For a start, direction does not automatically accord with word order. We have previously discovered interactions between decisions that are only credibly explained by planning ahead, such as attributive adjective phrases conditioned by the semantics of the head that follows. Similarly, objective who/whom alternation is shown to interact with a following subject (Wallis (2021: 39). The choice of subject, like the choice of noun phrase head, necessarily concerns the overarching intended meaning of the clause or phrase. When I say the large grey cat, I have a mental picture of the cat I am describing to you, and I am constrained by the eventual noun I might possibly eventually utter – cat, feline, animal, creature, etc.

In some processes, we might advance an argument that some decisions are likely to be made in a particular order because the option to add a second term only arises should the first be made, such as in embedded constructions (Wallis 2019). However, even embedding may involve some degree of look-ahead. Wallis (2022b) finds evidence that proper nouns postmodified by PPs found in titles appear to defy the expectation of a declining additive probability. Although analysed grammatically as multi-level embedding, the rise in probability observed appears to be only explicable by ‘chunking’ (the construction is introduced as a single unit), or the application of a title ‘formula’, such as the X of Y, e.g. the Duke of York.

In our case, a plausible cognitive model could hypothesise that the memory and attention demands of introducing an additional PP mitigates against it being added in the initial position, but this pattern in the data might be explained as a result of some other (possibly as yet unknown) phenomenon.

. . .


  1. Introduction
  2. Experiments
    2.1 Conjoined prepositional phrases containing noun phrases postmodified by PPs
    2.2 Interaction and patterning
    2.3 Conjoined noun phrases, postmodified by PPs
    2.4 Clausal postmodification
  3. Discussion
    3.1 The (directional) causality trap
    3.2 Templating evidence
    3.3 The effect of order
    3.4 Adjective phrase, PP and clausal distributions
  4. Conclusions


Bresnan, J., A. Cueni, T. Nikitina & R.H. Baayen 2007. Predicting the Dative Alternation. In G. Bouma, I. Kraemer, & J. Zwarts (eds.), Cognitive Foundations of Interpretation. Amsterdam: KNAW. 69-94.

Church, K. 2000. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2, Coling, 173-179.

Cowan, R. 2008. The Teacher’s Grammar of English with Answers. Cambridge: CUP.

Kaltenböck, G. 2020. Chapter 22 in Aarts, B., Popova, G. and Bowie, J. (eds.) The Oxford Handbook of English Grammar. Oxford University Press.

Nelson, G., B. Aarts & S.A. Wallis 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Varieties of English Around the World series. Amsterdam: John Benjamins.

Wallis, S.A. 2019. Investigating the additive probability of repeated language production decisions. International Journal of Corpus Linguistics 24:4, 490-521. » Post

Wallis, S.A. 2021. Statistics in Corpus Linguistics Research: A new approach. Routledge: New York. » More information

Wallis, S.A. 2022a. Accurate confidence intervals on Binomial proportions, functions of proportions, algebraic formulae and effect sizes. London: Survey of English Usage. » Post

Wallis, S.A. 2022b. Are embedding decisions independent? Evidence from preposition(al) phrases. London: Survey of English Usage. » Post


[1] The tree is drawn from left to right for reasons of space rather than top-down. Word order is from the top, down, on the right hand side. Gloss: NPHD = noun phrase head, NPPO = noun phrase postmodifier, PP = prepositional phrase, CJ = conjoin, P = prepositional (function), PREP = preposition, PC = prepositional complement, NP = noun phrase. Black arrows = immediately after, white arrows = (eventually) after.

[2] These are 95% intervals computed using the method outlined in (Wallis 2021: 225).

[3] This is not an additive probability chart. The equivalent additive probability chart would link p(last) with p(first | last) as a chain of additive decisions.

[4] The elevated double-postmodification rate is why χ2 was significant. To demonstrate this, the expected value is 21×47/186 = 5.31. Set out Table 1 with cells in the ‘known totals’ tab on the 2 × 2 χ2 spreadsheet, Then substitute 5.31 for 13. χ2, φ and φp tend to zero.

See also

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.