Welcome!

corp.ling.stats is a research blog by Sean Wallis focusing on the intersection between corpus linguistics research and mathematical statistics and probability theory. About this blogLatest…

News

Out now… Statistics in Corpus Linguistics Research (Routledge)

I am very pleased to announce that my new book, Statistics in Corpus Linguistics Research, is now available from Routledge. Drawing on more than ten years of research, and containing a large quantity of material never published before, the book is written for corpus linguistics researchers of all kinds, from students of corpus linguistics wishing to apply statistical analysis for the first time,… Continue reading Out now… Statistics in Corpus Linguistics Research (Routledge)

Designing experiments

The replication crisis: what does it mean for corpus linguistics?

Introduction Over the last year, the field of psychology has been rocked by a major public dispute about statistics. This concerns the failure of claims in papers, published in top psychological journals, to replicate. Replication is a big deal: if you publish a correlation between variable X and variable Y — that there is an… Continue reading The replication crisis: what does it mean for corpus linguistics?

What might a corpus of parsed spoken data tell us about language?

Abstract Paper (PDF) This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory-driven’… Continue reading What might a corpus of parsed spoken data tell us about language?

Genre differences and experimental observations

Spoken categories, modal verbs and change over time In a recently-published paper, Bowie, Wallis and Aarts (2013) demonstrate that observations regarding changes in the frequency of modal verbs over time are highly sensitive to differences in genre (‘register’ or ‘text category’). Our paper, although based on spoken British English, may shed some light on a… Continue reading Genre differences and experimental observations

Loading…

Something went wrong. Please refresh the page and/or try again.

Confidence intervals

Confidence intervals on percentage difference – a cautionary tale

Introduction In An algebra of intervals I remarked that the process of identifying a formula for a confidence interval for a metric involves a process of analytical reduction. This is a process of formulating the metric into the simplest possible combination of independent parameters. As a general rule, before we carry out any computation of… Continue reading Confidence intervals on percentage difference – a cautionary tale

Confidence intervals on goodness of fit ϕ scores

Introduction In Wallis (2021), I offered two approaches to computing confidence intervals on the effect size Cramér’s ϕ. I also motivated and summarised approaches to a comparable goodness of fit metric (where a high ϕ score reflects a greater difference and thus a ‘poor fit’). A goodness of fit evaluation is one where we compare… Continue reading Confidence intervals on goodness of fit ϕ scores

Accurate confidence intervals

1. Introduction Paper (PDF) There is a growing interest among practising researchers in plotting data with confidence intervals (sometimes termed ‘credible intervals’ or ‘compatibility intervals’). However, many statistics compendia offer extremely limited coverage of confidence interval methods, and cited formulae often involve an elementary but very common mathematical error. This issue is of particular concern… Continue reading Accurate confidence intervals

An algebra of intervals

Introduction Many researchers wish to compute confidence intervals on measures other than the simple Binomial proportion, p, or difference between two independent proportions, d = p2 – p1. On this blog we have identified the Wilson score interval on p and the repositioned Newcombe-Wilson difference interval on d as suitable for these purposes. For example,… Continue reading An algebra of intervals

Loading…

Something went wrong. Please refresh the page and/or try again.

Contingency tests… latest posts…