Corpus Data and Word Tables: A Linguist's Toolkit (Paper 3: Language Change)

Hello and welcome to one of the most exciting, yet potentially daunting, parts of Paper 3: analyzing quantitative data! Don't worry if charts and numbers aren't your favourite thing. When we study Word Tables derived from Corpus Data, we aren't doing maths; we are simply using huge amounts of statistical evidence to prove how language changes over time.

In this chapter, you will learn how linguists use massive digital databases (corpora) to track the evolution of English vocabulary, meaning, and usage—which is essential for success in Paper 3, Section A.

What is a Corpus (and Why Does it Matter)?

Before tackling the tables, you must understand the source of the data: the Corpus.

Definition of Corpus Data

A Corpus (plural: *corpora*) is a large, structured collection of natural texts or speech samples stored electronically. Think of it as a gigantic, meticulously organized digital library of language use.

  • Size: Corpora often contain hundreds of millions, sometimes billions, of words.
  • Purpose: They allow linguists to study language scientifically by observing patterns, frequency, and context, rather than just relying on intuition.

Did you know? The British National Corpus (BNC) is one of the most famous examples, containing 100 million words of British English, divided between written and spoken texts.

The Diachronic Perspective in Paper 3

For Paper 3, we usually look at diachronic corpora—collections of texts spanning different historical periods (e.g., a corpus of 18th-century English compared to a corpus of 21st-century English).

  • Diachronic: Studying language across time (e.g., tracking the word 'awful' from meaning 'full of awe' to 'terrible').
  • Synchronic: Studying language at a single point in time (less relevant for Paper 3, Section A).

Key Takeaway: The corpus provides the raw, statistical evidence needed to support theories about language change.

Decoding the Word Table

A word table is simply a way of presenting the statistical findings derived from a corpus. When you see a word table in the exam, your job is to translate the numbers into meaningful linguistic observations.

1. Analysing Word Frequency

Frequency tells you how often a specific word or phrase appears within the corpus. This is usually normalized to reflect usage per million words, making comparisons across different sized texts fair.

  • High Frequency: Indicates a word is commonly used, possibly signaling popularisation or lexicalisation (making a new word standard). Example: If the word "selfie" has a frequency of 50 per million words in the 2010s corpus, but 0 in the 1990s corpus, this strongly evidences its recent adoption.
  • Low Frequency: Might indicate a word is archaism, specialist jargon, or just not widely adopted.

Memory Tip (F): Frequency = For How Often is it used?

2. Analysing Dispersion and Range

While frequency tells you *how many times* a word appears, Dispersion or Range tells you *how widely* it is used across different types of texts (genres, registers, authors).

  • High Dispersion: The word appears consistently across different registers (e.g., newspapers, academic papers, fiction, casual speech). This suggests it is a mainstream, well-established part of the language.
  • Low Dispersion: The word is concentrated in one or two types of text. Example: If a technical term like "phoneme" is frequent in the academic sub-corpus but nowhere else, it has low dispersion and remains specialist lexis.

Memory Tip (D): Dispersion = Do they use it Different places?

3. Understanding Collocation: The Key to Semantic Change

Collocation is arguably the most valuable piece of data in language change analysis. A word table often includes a separate list of the words that most frequently appear near the focus word.

What are Collocates?

Collocates are words that are habitually or statistically likely to occur together. They are the linguistic "friends" of a word.

  • Think of peanut butter and jelly, or fish and chips. They just belong together.
  • Example: The word 'strong' collocates highly with 'tea', 'opinion', and 'wind'.
Using Collocates to Identify Semantic Shift

If a word's meaning changes (a semantic shift), its collocates will also change over time.

  • Scenario: In a 19th-century corpus, the collocates of 'gay' might include 'party', 'mood', or 'dress' (meaning 'joyful' or 'bright').
  • Change: In a 21st-century corpus, the collocates of 'gay' will overwhelmingly include words like 'rights', 'community', or 'marriage' (reflecting its modern, sexual orientation-based meaning).

Analyzing this shift in collocates provides irrefutable evidence for narrowing (the meaning becomes more specific) and pejoration (the meaning takes on negative connotations) or amelioration (the meaning improves).

Memory Tip (C): Collocation = Changing Context (and therefore, Changing Connotations).

Quick Review: Interpreting the Data (FCD)

When looking at a word table, always ask about the FCD:

Frequency: Is it rising or falling? (How common is the word?)
Collocation: What are its new neighbours? (How has the meaning changed?)
Dispersion: Is it spreading? (How widely accepted is the word?)

Applying Corpus Data to Language Change (Exam Focus)

In Paper 3, you will use the word table and n-gram graph data to support your analysis of a primary prose text and your knowledge of language change theories.

Step-by-Step Analysis Process

Step 1: Contextualise the Data

First, identify the time periods represented in the table (e.g., 1700-1800 vs. 2000-2020). Language change must be analyzed diachronically.

Step 2: Note Significant Frequency Shifts (Lexical Change)

Look for dramatic increases or decreases in frequency.

  • Increase: A sharp rise in a term like 'digital' between the 1980s and the 2000s indicates the impact of technological change on the lexicon. The word is undergoing functional shift (it might now be used as an adjective, not just a noun).
  • Decrease: A decline in a formal greeting like 'hark' suggests archaisms and the gradual loss of vocabulary due to changing social conventions (e.g., the Functional Theory of change).
Step 3: Analyze Shifts in Collocation (Semantic Change)

This is where you earn the highest marks. Focus on the accompanying word list (collocates).

  • Example: If the word 'cool' shifts its collocates from 'water' and 'breeze' (literal temperature) to 'style' and 'attitude' (abstract approval), you are seeing amelioration and broadening of the term.
  • Key Terminology: Use precise terms like semantic narrowing, pejoration, amelioration, or metaphorical extension to explain *how* the change happened, supported by the collocate data.
Step 4: Synthesize Findings with Theories (AO4)

Connect the statistical evidence to broader linguistic concepts.

  • If a new, highly frequent word (like 'vlog') appears, you can link this to Cultural Transmission Theory, where technology drives the need for new labels, or The Theory of Lexical Gaps, where new concepts require new linguistic slots.
  • If an old word declines, you might link it to the process of standardisation (leading to archaisms) or Random Fluctuation Theory if the change seems arbitrary.

Common Mistakes to Avoid

  • Mistake 1: Treating raw numbers as absolute. Always remember frequency is relative to the size of the corpus. Use normalized figures (e.g., "per million words").
  • Mistake 2: Stating the obvious. Don't just say "the word got more frequent." You must explain *why* (social context, technology, genre shift) and *what* this means (e.g., "This shows lexical broadening in response to mass media").
  • Mistake 3: Confusing Collocation with Synonymy. Collocates are words that *appear with* the target word; they are not necessarily words that *mean the same* as the target word (synonyms are sometimes listed separately in corpus data). Focus on the partnership, not the substitution.

Key Takeaway: Word tables are your quantitative evidence. They prove whether a perceived change (like a word becoming more common) is statistically verifiable, allowing you to move beyond speculation into confident analysis.