Are embedding decisions independent?

Evidence from preposition(al) phrases

Abstract Full Paper (PDF)

One of the more difficult challenges in linguistics research concerns detecting how constraints might apply to the process of constructing phrases and clauses in natural language production. In previous work (Wallis 2019) we considered a number of operations modifying noun phrases, including sequential and embedded modification with postmodifying clauses. Notably, we found a pattern of a declining additive probability for each decision to embed postmodifying clauses, albeit a pattern that differed in speech and writing.

In this paper we use the same research paradigm to investigate the embedding of an altogether simpler structure: postmodifying nouns with prepositional phrases. These are approximately twice as frequent and structures exhibit as many as five levels of embedding in ICE-GB (two more than are found for clauses). Finally the embedding model is simplified because only one noun phrase can be found within each prepositional phrase. We discover different initial rates and patterns for common and proper nouns, and certain subsets of pronouns and numerals. Common nouns (80% of nouns in the corpus) do appear to generate a secular decline in the additive probability of embedded prepositional phrases, whereas the equivalent rate for proper nouns rises from a low initial probability, a fact that appears to be strongly affected by the presence of titles.

It may be generally assumed that like clauses, prepositional phrases are essentially independent units. However, we find evidence from a number of sources that indicate that some double-layered constructions may be being added as single units. In addition to titles, these constructions include schematic or idiomatic expressions whose head is an ‘indefinite’ pronoun or numeral. Continue reading “Are embedding decisions independent?”

Boundaries in nature

Although we are primarily concerned with Binomial probabilities in this blog, it is occasionally worth a detour to make a point.

A common bias I witness among researchers in discussing statistics is the intuition (presumption) that distributions are Gaussian (Normal) and symmetric.  But many naturally-occurring distributions are not Normal, and a key reason is the influence of boundary conditions.

Even for ostensibly Real variables, unbounded behaviour is unusual. Nature is full of boundaries.

Consequently, mathematical models that incorporate boundaries can sometimes offer a fresh perspective on old problems. Gould (1996) discusses a prediction in evolutionary biology regarding the expected distribution of biomass for organisms of a range of complexity (or scale), from those composed of a single cell to those made up of trillions of cells, like humans. His argument captures an idea about evolution that places the emphasis not on the most complex or ‘highest stages’ of evolution (as conventionally taught), but rather on the plurality of blindly random evolutionary pathways. Life becomes more complex due to random variation and stable niches (‘local maxima’) rather than some external global tendency, such as a teleological advantage of complexity for survival.

Gould’s argument may be summarised in the following way. Through blind random Darwinian evolution, simple organisms may evolve into more complex ones (‘complexity’ measured as numbers of cells or organism size), but at the same time others may evolve into simpler, but perhaps equally successful ones. ‘Success’ here means reproductive survival – producing new organisms of the same scale or greater that survive to reproduce themselves.

His second premise is also non-controversial. Every organism must have at least one cell and all the first lifeforms were unicellular.

Now, run time’s arrow forwards. Assuming a constant and an equal rate of evolution, by simulation we can obtain a range of distributions like those in the Figure below.

Gould's curves

Continue reading “Boundaries in nature”

The other end of the telescope

Introduction

The standard approach to teaching (and thus thinking about) statistics is based on projecting distributions of expected values. The distribution of an expected value is a set of probabilities that predict the relative chance of each possible value, according to a mathematical model of what you predict should happen.

For example, the Binomial model predicts the chance of obtaining f instances of a type, A, drawn from n instances of two types, A and B, assuming each instance in the sample is independent from the next, the population is infinite in size, and any random instance could be either A or B.

For the experimentalist, this distribution is the imaginary distribution of very many repetitions of the same experiment that you have undertaken. It is the output of a mathematical model.

  • Note that this idea of a projected distribution is not the same as the ‘expected distribution’. An expected distribution is a series of values you expect your data should match, according to your null hypothesis.
  • Thus in what follows we simply compare a single expected value P with an observed value p. This can be thought of as comparing the expected distribution E = {P, 1 – P} with the observed distribution O = {p, 1 – p}.

Thinking about this projected distribution represents a colossal feat of imagination: it is a projection of what you think would happen if only you had world enough and time to repeat your experiment, again and again. But often you can’t get more data. Perhaps the effort to collect your data was huge, or the data is from a finite set of available data (historical documents, patients with a rare condition, etc.). Actual replication may be impossible for material reasons.

In general, distributions of this kind are extremely hard to imagine, because they are not part of our directly-observed experience. See Why is statistics difficult? for more on this. So we already have an uphill task in getting to grips with this kind of reasoning.

Significant difference (often shortened to ‘significance’) refers to the difference between your observations (the ‘observed distribution’) and what you expect to see (the expected distribution). But to evaluate whether a numerical difference is significant, we have to take into account both the shape and spread of this projected distribution of expected values.

When you select a statistical test you do two things:

  • you choose a mathematical model which projects a distribution of possible values, and
  • you choose a way of calculating significant difference.

The problem is that in many cases it is very difficult to imagine this projected distribution, or — which amounts to the same thing — the implications of the statistical model.

When tests are selected, the main criterion you have to consider concerns the type of data being analysed (an ‘ordinal scale’, a ‘categorical scale’, a ‘ratio scale’, and so on). But the scale of measurement is only one of several parameters that allows us to predict how random selection might affect the sampling of data.

A mathematical model contains what are usually called assumptions, although it might be more accurate to call them ‘preconditions’ or parameters. If these assumptions about your data are incorrect, the test will probably give an inaccurate result. This principle is not either/or, but can be thought of as a scale of ‘degradation’. The less the data conforms to these assumptions, the more likely your test is to give the wrong answer.

This is particularly problematic in some computational applications. The programmer could not imagine the projected distribution, so they tweaked various parameters until the program ‘worked’. In a ‘black-box’ algorithm this might not matter. If it appears to work, who cares if the algorithm is not very principled? Performance might be less than optimal, but it may still produce valuable and interesting results.

But in science there really should be no such excuse.

The question I have been asking myself for the last ten years or so is simply can we do better? Is there a better way to teach (and think about) statistics than from the perspective of distributions projected by counter-intuitive mathematical models (taken on trust) and significance tests? Continue reading “The other end of the telescope”