Welcome!

corp.ling.stats is a research blog by Sean Wallis focusing on the intersection between corpus linguistics research and mathematical statistics and probability theory. About this blogLatest…

News

Out now… Statistics in Corpus Linguistics Research (Routledge)

I am very pleased to announce that my new book, Statistics in Corpus Linguistics Research, is now available from Routledge. Drawing on more than ten years of research, and containing a large quantity of material never published before, the book is written for corpus linguistics researchers of all kinds, from students of corpus linguistics wishing to apply statistical analysis for the first time,… Continue reading Out now… Statistics in Corpus Linguistics Research (Routledge)

Designing experiments

Directional evidence revisited

End weight bias and templating in conjoined phrase postmodification Abstract Full Paper (PDF) The tendency of speakers and writers to place larger constructions at the end of sentences, whether consciously or unconsciously, is well established. Often this question of ‘end weight’ is usually discussed in relation to grammatical transformations. In this short paper we demonstrate… Continue reading Directional evidence revisited

Are embedding decisions independent?

Evidence from preposition(al) phrases Abstract Full Paper (PDF) One of the more difficult challenges in linguistics research concerns detecting how constraints might apply to the process of constructing phrases and clauses in natural language production. In previous work (Wallis 2019) we considered a number of operations modifying noun phrases, including sequential and embedded modification with… Continue reading Are embedding decisions independent?

The replication crisis: what does it mean for corpus linguistics?

Introduction Over the last year, the field of psychology has been rocked by a major public dispute about statistics. This concerns the failure of claims in papers, published in top psychological journals, to replicate. Replication is a big deal: if you publish a correlation between variable X and variable Y — that there is an… Continue reading The replication crisis: what does it mean for corpus linguistics?

What might a corpus of parsed spoken data tell us about language?

Abstract Paper (PDF) This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory-driven’… Continue reading What might a corpus of parsed spoken data tell us about language?

Loading…

Something went wrong. Please refresh the page and/or try again.

Confidence intervals

Plotting the distributions of intervals for the power and log of proportions

Introduction In a previous post I showed how to derive intervals for the power and log operators for two uncertain parameters, e.g. g = p1p2, ‘p1 to the power of p2’, and l = logp2(p1), ‘log of p1 to base p2’. With two uncertain (observed) proportions, or functions of them, we may use Zou and… Continue reading Plotting the distributions of intervals for the power and log of proportions

Confidence intervals on powers and logs

Introduction Previously, I gave algebraic solutions for computing accurate confidence intervals on the difference, sum, product and ratio of two (or more) parameters. These parameters may be simple proportions or monotonic functions of proportions. They may even be other algebraic functions of multiple parameters, provided that these parameters do not appear elsewhere in the equation.… Continue reading Confidence intervals on powers and logs

Evaluating the performance of risk ratio and odds ratio tests

Introduction In An algebra of intervals I described how, using an approximation by Zou and Donner (2008), it was possible to develop an algebraic solution for the confidence interval for a number of functions of independent parameters. These confidence intervals included the risk ratio, the ratio of two independent proportions, r = p1/p2, and the… Continue reading Evaluating the performance of risk ratio and odds ratio tests

Loading…

Something went wrong. Please refresh the page and/or try again.

Contingency tests… latest posts…