Welcome!

corp.ling.stats is a research blog by Sean Wallis focusing on the intersection between corpus linguistics research and mathematical statistics and probability theory. About this blogLatest…

News

Coming soon… Statistics in Corpus Linguistics Research (Routledge)

I am very pleased to announce that my new book, Statistics in Corpus Linguistics Research, is available to pre-order and will be published in November 2020 by Routledge. Drawing on over ten years’ of work, and containing a large quantity of material never published before, the book is written for corpus linguistics researchers of all… Continue reading Coming soon… Statistics in Corpus Linguistics Research (Routledge)

Designing experiments

The replication crisis: what does it mean for corpus linguistics?

Introduction Over the last year, the field of psychology has been rocked by a major public dispute about statistics. This concerns the failure of claims in papers, published in top psychological journals, to replicate. Replication is a big deal: if you publish a correlation between variable X and variable Y – that there is an… Continue reading The replication crisis: what does it mean for corpus linguistics?

What might a corpus of parsed spoken data tell us about language?

Abstract Paper (PDF) This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory-driven’… Continue reading What might a corpus of parsed spoken data tell us about language?

Genre differences and experimental observations

Spoken categories, modal verbs and change over time In a recently-published paper, Bowie, Wallis and Aarts (2013) demonstrate that observations regarding changes in the frequency of modal verbs over time are highly sensitive to differences in genre (‘register’ or ‘text category’). Our paper, although based on spoken British English, may shed some light on a… Continue reading Genre differences and experimental observations

Loading…

Something went wrong. Please refresh the page and/or try again.

Confidence intervals

The variance of chi-square

Recently, I have been reviewing some work I conducted developing confidence intervals for Cramér’s ϕ, building on Bishop, Fienberg and Holland (1975). Finalising the edit for my forthcoming book (Wallis, 2020), I realised that Yvonne Bishop and colleagues had provided a formula for the variance of χ² without saying so explicitly! The authors show how… Continue reading The variance of chi-square

Further evaluation of Binomial confidence intervals

Abstract Paper (PDF) Wallis (2013) provides an account of an empirical evaluation of Binomial confidence intervals and contingency test formulae. The main take-home message of that article was that it is possible to evaluate statistical methods objectively and provide advice to researchers that is based on an objective computational assessment. In this article we develop… Continue reading Further evaluation of Binomial confidence intervals

Deconstructing the chi-square

Introduction Elsewhere in this blog we introduce the concept of statistical significance by considering the reliability of a single sampled observation of a Binomial proportion: an estimate of the probability of selecting an item in the future. This allows us to develop an understanding of the likely distribution of what the true value of that… Continue reading Deconstructing the chi-square

Loading…

Something went wrong. Please refresh the page and/or try again.

Contingency tests… latest posts…