Paper 3 Language Analysis: Understanding N-Gram Graphs
Hey there, future linguist! Welcome to one of the most exciting, data-driven areas of language analysis. In Paper 3, Section A (Language Change), you need to be able to analyze historical texts, but also interpret hard data that proves those changes are real. That's where n-gram graphs come in!
This chapter will equip you with the skills to read these linguistic graphs. Don't worry if 'n-gram' sounds complicated—it's just a fancy way of counting words to see how our language has shifted over hundreds of years. This skill is vital for achieving AO4 (understanding methods) and AO5 (analysing data).
What is Corpus Data and Why Do We Need It?
The Digital Library: Language Corpora
To study how language changes over time (a concept called Diachronic Linguistics), we need massive amounts of reliable text. We can't just rely on a few letters or poems.
-
Key Term: Corpus (plural: Corpora)
A Corpus is a very large, structured, digital collection of texts (written or spoken), designed to be representative of a specific language or language variety (like American English, or 18th-century medical texts).
- Analogy: Think of a corpus as the world's biggest digital library, sorted perfectly by date and genre. Instead of flipping pages, a computer can instantly count every occurrence of a specific word or phrase.
The most famous corpus used for generating these graphs is the Google Books Corpus, which contains billions of words scanned from books published between the 1500s and today.
Breaking Down the N-Gram
The term 'n-gram' simply refers to a sequence of 'N' items (usually words or letters) found in the text corpus.
- N = Number. The number tells you how many words are in the sequence you are counting.
-
1-gram (Unigram): A single word.
Example: "bicycle", "internet", "thou" -
2-gram (Bigram): A sequence of two words.
Example: "shall not", "social media", "I am" -
3-gram (Trigram): A sequence of three words.
Example: "at the end", "if you like"
Quick Review: An N-gram is the thing we are counting, and the Corpus is the massive source material we are counting it in.
Interpreting N-Gram Graphs: The Visualization of Change
An N-gram graph (or viewer) takes the raw count data from the corpus and plots it onto a chart. This allows us to see how the frequency (how often the word/phrase is used) changes over time.
Understanding the Graph Axes
N-gram graphs are typically structured to show change over time:
-
The X-Axis (Horizontal): Time/Years
This axis usually runs from an older date (e.g., 1700) up to the modern day (e.g., 2000). It shows the historical period you are analysing.
-
The Y-Axis (Vertical): Frequency or Percentage
This is the most critical axis. It shows the relative frequency of the N-gram. Frequency is usually expressed as a percentage of all the words published in that year. This is important because it normalises the data—it accounts for the fact that far more books were published in 1950 than in 1750.
Key Takeaway: The slope of the line tells the story. A steep rise means rapid adoption; a steady fall means obsolescence.
Skills for Analyzing N-Gram Graphs in Paper 3
When you are given an N-gram graph in the exam, your task is not just to describe the lines, but to analyze them and link them to linguistic concepts (AO5).
1. Identifying Key Trends (The Basics)
-
A Rising Line: Indicates increasing usage or popularity of the word/phrase. This often suggests Lexical Acquisition (the word entered the language) or Broadening of meaning.
Example: The usage of "technology" rising sharply after 1940. -
A Falling Line: Indicates decreasing usage, leading potentially to Lexical Loss or Archaism (the word becoming outdated).
Example: The usage of "whilst" falling as "while" increases. - A Flat Line: Suggests stable usage, often indicating a fundamental grammatical or common lexical item (like "the" or "water").
2. Analyzing Competition and Replacement
Often, graphs compare two or more terms that perform the same function. This analysis reveals Lexical Replacement.
-
The Crossover Point: Look closely at where two lines intersect. This point marks the time when one term officially became more common than another.
Example: The line for "radio" might cross the line for "wireless" around the 1930s, showing that the newer term won the lexical race. - Divergence: If two words start together but then one rises rapidly while the other falls, it shows that the meanings have likely Diverged (split), or that one is replacing the other.
3. Contextualizing Anomalies (The "Why")
If you see a sudden, sharp spike or dip, you must link this linguistic data to real-world social, political, or technological context (AO4).
-
Sudden Spikes: Often caused by major historical events.
Example: A huge spike in the use of "trench warfare" between 1914 and 1918. -
Sudden Acquisition: A new invention or concept forcing the creation of new lexis.
Example: The words "internet" or "email" appearing from zero use around 1990 and rapidly rising.
💡 Common Mistakes to Avoid
- Mistake 1: Confusing Counts with Frequency. You must remember the Y-axis shows relative frequency (how common the word is compared to all other words in that year), not the absolute number of times it appeared. This prevents skewed results based on how many books were published.
- Mistake 2: Ignoring the Dates. Always refer to specific time periods in your analysis (e.g., "The word peaked in usage between 1880 and 1910 before beginning a rapid decline").
- Mistake 3: Just describing the line. The examiner wants analysis. Don't just say, "The line goes up." Say, "The significant upward trend in usage of 'astronaut' after 1960 reflects lexical acquisition driven by the global 'Space Race' context."
Real-World Language Change Concepts Illustrated by Graphs
N-gram graphs are perfect visual proof for many core concepts of language change you study in Paper 3.
A. Semantic Change (Meaning Shifts)
Graphs can show shifts in frequency that relate to how a word's meaning changes.
-
Amelioration vs. Pejoration: If the frequency of a word drops sharply, it might be due to Pejoration (it acquired negative connotations), causing speakers to seek a replacement term.
Did you know? The term 'gay' dramatically changes its frequency and context post-1960 due to semantic shift. - Narrowing vs. Broadening: If a word's usage rises slowly but steadily over centuries, it might be due to Broadening (its meaning covers more concepts now). If a word sharply declines, it may have Narrowed into a highly specific, niche context.
B. Grammatical and Orthographical Change
N-grams aren't just for single words; they are excellent for spotting grammatical structure changes or spelling shifts.
- Orthography (Spelling): Comparing "colour" vs. "color" (a bigram if searched with "the colour" vs. "the color") shows how orthography differs geographically and when those variations solidified.
- Grammar/Syntax: Analyzing the usage of "hath" vs. "has" reveals the gradual decay of older inflections and the move towards modern syntax in the Early Modern English period.
C. Lexical Competition
Graphs clearly demonstrate the competition between synonyms or alternative phrasing, often leading to one term becoming dominant.
Example: A graph comparing the phrases "cannot help but" and "can't help but" shows the increasing formality/informality of written English over time, as the contracted, more conversational version rises in frequency.
Quick Review Checklist for N-Gram Analysis
- What specific N-gram (word/phrase) is being tracked?
- What time period does the graph cover?
- What is the overall trend (rising/falling/flat)?
- Is there a crossover point showing lexical replacement? (What are the two competing terms?)
- Is there a sudden spike or dip (an anomaly)? (What historical context explains it?)
- Which specific concept of language change (e.g., lexical loss, amelioration, lexical acquisition) does this graph illustrate?
If you can answer these six questions, you have successfully interpreted the data and are ready to synthesize it into your Paper 3 essay!