Summer School in English Corpus Linguistics 2024 (online)

  • Have you begun research with corpora, but have been unsure what to do next?
  • Have you heard about parsed corpora (treebanks) but wondered how you might use them in your research?
  • Do you want a practical primer in statistics?

I’m pleased to announce the eleventh UCL Summer School in English Corpus Linguistics, our masterclass in research with parsed corpora, which is taking place online from 1-3 July. It is timed to run from 9:00 to 13:30 British Summer Time (GMT+1), to make it accessible for students across Europe, Africa, Asia and Australasia.

The Summer School is a short three-day intensive course aimed at PhD-level students and researchers who wish to get to grips with Corpus Linguistics from the perspective of the ‘Survey Methodology’. It is offered at £165 for early bookings made before 14 May, rising to £195 after.

This year we are innovating our programme, including a new session on World Englishes with Guyanne Wilson.

This is a picture of our face-to-face teaching on the course, which we would love to return to! But with Covid-19 a continuing threat worldwide, and for reasons of accessibility and cost, we have decided to run the course online for another year.
This is a picture of our face-to-face teaching on the course, which we would love to return to! But with Covid-19 a continuing threat worldwide, and for reasons of accessibility and cost, we have decided to run the course online for another year.

Aims and objectives of the course

Over the course of the three days, participants learn about the following:

  • the scope of Corpus Linguistics, and how we can use it to study the English Language;
  • key issues in Corpus Linguistics methodology;
  • how to use corpora to analyse issues in syntax and semantics;
  • basic elements of statistics;
  • how to navigate large and small corpora, particularly ICE-GB and DCPSE.

Learning outcomes

At the end of the course, participants should have:

  • acquired a basic but solid knowledge of the terminology, concepts and methodologies used in English Corpus Linguistics;
  • had practical experience working with two state-of-the-art corpora and a corpus exploration tool (ICECUP);
  • have gained an understanding of the breadth of Corpus Linguistics and the potential application for projects;
  • have learned about the fundamental concepts of inferential statistics and their practical application to Corpus Linguistics.

What it costs

  • Attendance fee: £165 until May 14; £195 afterwards.
  • Corpus and software:
    • temporarily accessible for all participants (no separate purchase necessary)
    • choice of free corpus to all students, £25 for both corpora
    • Special Offer of 25% standard price for all attendees

See also

Are embedding decisions independent?

Evidence from preposition(al) phrases

Abstract Full Paper (PDF)

One of the more difficult challenges in linguistics research concerns detecting how constraints might apply to the process of constructing phrases and clauses in natural language production. In previous work (Wallis 2019) we considered a number of operations modifying noun phrases, including sequential and embedded modification with postmodifying clauses. Notably, we found a pattern of a declining additive probability for each decision to embed postmodifying clauses, albeit a pattern that differed in speech and writing.

In this paper we use the same research paradigm to investigate the embedding of an altogether simpler structure: postmodifying nouns with prepositional phrases. These are approximately twice as frequent and structures exhibit as many as five levels of embedding in ICE-GB (two more than are found for clauses). Finally the embedding model is simplified because only one noun phrase can be found within each prepositional phrase. We discover different initial rates and patterns for common and proper nouns, and certain subsets of pronouns and numerals. Common nouns (80% of nouns in the corpus) do appear to generate a secular decline in the additive probability of embedded prepositional phrases, whereas the equivalent rate for proper nouns rises from a low initial probability, a fact that appears to be strongly affected by the presence of titles.

It may be generally assumed that like clauses, prepositional phrases are essentially independent units. However, we find evidence from a number of sources that indicate that some double-layered constructions may be being added as single units. In addition to titles, these constructions include schematic or idiomatic expressions whose head is an ‘indefinite’ pronoun or numeral. Continue reading “Are embedding decisions independent?”

Is language really “a set of alternations?”

The perspective that the study of linguistic data should be driven by studies of individual speaker choices has been the subject of attack from a number of linguists.

The first set of objections have come from researchers who have traditionally focused on linguistic variation expressed in terms of rates per word, or per million words.

No such thing as free variation?

As Smith and Leech (2013) put it: “it is commonplace in linguistics that there is no such thing as free variation” and that indeed multiple differing constraints apply to each term. On the basis of this observation they propose an ‘ecological’ approach, although in their paper this approach is not clearly defined.

Continue reading “Is language really “a set of alternations?””