Although we are primarily concerned with Binomial probabilities in this blog, it is occasionally worth a detour to make a point.

A common bias I witness among researchers in discussing statistics is the intuition (presumption) that distributions are Gaussian (Normal) and symmetric. But many naturally-occurring distributions are not Normal, and a key reason is the influence of *boundary conditions*, as in this simple example.

Even for ostensibly Real variables, unbounded behaviour is unusual. Nature is full of boundaries.

Consequently, mathematical models that incorporate boundaries can sometimes offer a fresh perspective on old problems. Gould (1996) discusses a prediction in evolutionary biology regarding the expected distribution of biomass for organisms of a range of complexity (or scale), from those composed of a single cell to those made up of trillions of cells, like humans. His argument captures an idea about evolution that places the emphasis not on the most complex or ‘highest stages’ of evolution (as conventionally taught), but rather on the plurality of blindly random evolutionary pathways. Life becomes more complex due to random variation and stable niches (‘local maxima’) rather than some external global tendency, such as a teleological advantage of complexity for survival.

Gould’s argument may be summarised in the following way. Through blind random Darwinian evolution, simple organisms may evolve into more complex ones (‘complexity’ measured as numbers of cells or organism size), but at the same time others may evolve into simpler, but perhaps equally successful ones. ‘Success’ here means *reproductive survival* – producing new organisms of the same scale or greater that survive to reproduce themselves.

His second premise is also non-controversial. Every organism must have at least one cell and all the first lifeforms were unicellular.

Now run time’s arrow forwards. Assuming a constant and an equal rate of evolution, by simulation we can obtain a range of distributions like those in the Figure below.

The result of this ‘evolution’ is a Poisson distribution of biomass over complexity (Gould 1996: 171). Provided *some* species evolve into more complex forms, over time the concentration of biomass at the bottom end (complexity *c*) reduces and the length of the upper ‘tail’ of the distribution increases.

Anthropocentrism is the ideological predisposition to view ourselves in the centre of history. From this perspective, evolution is often explained to children as a story leading to human ‘perfection’. We see complex organisms like us at *c* = 10 in the Figure above, and attempt to trace back evolutionary paths. Yet we know that present-day animals are equally ‘evolved’, and one can see it as a story of increasing diversity rather than perfection. Indeed, from the perspective of the total distribution of biological matter, ‘the Earth is currently populated by unicellular organisms with a long tail’.

This model predicts that biomass is distributed such that the vast majority of living cells are to be found in organisms at the lowest end of complexity. If evolution could only increase complexity, we would see an exponential distribution with a decreasing peak and increasing spread as time *t* increased. But in our model, like Gould’s, we allowed evolution to decrease and increase the proportion of biomass at any level of complexity c at the same rate. Eventually *c* = 2 overtakes *c* = 1 because for *c* = 1 all evolution must be in the direction of increasing complexity.

Something appeared to be missing from Gould’s prediction. According to his model, there should be many more unicellular organisms on Earth than had previously been estimated. It was eventually supported by the discovery of unicellular organisms in Darwinian niches deep in the Earth’s crust. The missing biomass turned out to be underground, in soil and rocks.

Gould’s model emphasises that systems can generate decidedly non-Normal outcomes where boundaries are involved. In fact, physics places lower and upper limits on other tangible variables, such as height. Yet textbooks on statistics for school students gives the distribution of heights of children in a class as an example of data expected to have a Normal distribution.

If you think about it, the height of schoolchildren has a lower limit of (more than) zero! This does not appear to matter to the textbook example because we do not expect a typical class sample to be close to the physical limit, and therefore an approximation to the Normal appears reasonable. There is an error introduced by the impact of the boundary, but, as data points are usually clustered far from the boundary, that error is small.

In other words, boundaries matter most if you are close to them.

Physics also places upper limits on physical size. For example, bone and muscle strength are in proportion to the cross-section of a leg, whereas mass is in proportion to volume. Alligators grow in proportion, so an alligator that doubles in length will increase in volume by the cubic (2³ = 8), but its legs will only increase in cross-section by the square (2² = 4). If there was a glut of food such that these animals grew extremely large, they would hit the limit that their legs could not support their mass any further. The largest animals on Earth are waterborne, and Godzilla’s legs would snap if she stood up.

### Algorithm

This code simulates the effect of random variation on complexity over evolutionary cycles, commencing with unicellular organisms. Reconstructed based on a description by Stephen J. Gould (1996). Bold font (e.g. ‘**distribution**’) refers to sets.

function Gould(*cycles*, *rate*)

{integer *t*, *i*

float *e*, **distribution**(*cycles*)

set **distribution** = {1, 0, 0, … } // 100% are unicellular, in cell 0

set **changes** = {0, 0, 0, …}

for *t* = 0 to *cycles* – 1 // cycles over time* t*

{for *i* = 0 to *t* // first calculate changes on the basis of generation *t*

{*e* = *rate* × **distribution**(*i*)

if (*i* > 0) // if not at boundary

{**changes**(*i* – 1) += *e*/2 // increase on either side of *i*

**changes**(*i* + 1) += *e*/2 // i.e. half evolve up, half down}

else

**changes**(*i* + 1) += *e* // otherwise, all evolve up**changes**(*i*) –= *e* // reduce at position *i* accordingly}

for *i* = 0 to *t* + 1 // once calculated, apply and clear changes

{**distribution**(*i*) += **changes**(*i*)

**changes**(*i*) = 0}}

plot **distribution** // plot curve over cycles}

### References

Gould, S.J. (1996). *Life’s Grandeur*. London: Random House.