Research methods 2 - Psychology (9685) - Oxford AQA A-Level

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.

Welcome to Research Methods 2: Advanced Skills!

Hello future Psychologists! You’ve already mastered the basics of experiments, observations, and initial data analysis from Research Methods 1. This chapter, Research Methods 2 (3.3.3), takes those skills to the next level.

We are diving into how researchers ensure their findings are trustworthy (reliability and validity), how they formally present their work, and, most importantly, the exciting world of inferential statistics—the maths that tells us if our results are real or just a fluke!

Don't worry if the statistical section seems intimidating. We will break it down into simple decisions. By the end of this chapter, you’ll be able to confidently read and critique professional psychological research. Let's get started!

Part 1: Advanced Research Methods

1.1 Content Analysis

Content analysis is a method used to analyse qualitative data (like interviews, diaries, or newspaper articles) and turn it into quantitative data (numbers) for statistical testing. Think of it as systematically counting themes or categories.

How to conduct Content Analysis (Step-by-Step):

1. Sampling: Select the material you want to analyse (e.g., 50 newspaper headlines about climate change).
2. Coding: Develop behavioural categories (or coding units). These are the specific themes or words you will be looking for and counting. Example: If studying advertisements, categories might be "Gender Stereotypes," "Use of Humour," or "Price Focus."
3. Data Collection: Work through the material, counting how many times each category appears.
4. Analysis: Use descriptive statistics (like calculating the mean frequency) or inferential statistics on the quantitative data produced.

Key Takeaway: Content analysis turns words into measurable data. This makes qualitative data easier to compare and statistically analyse.

1.2 Case Studies

A case study is an intensive, in-depth investigation of one person, a small group of people, an institution, or an event. They often involve gathering data using multiple methods (interviews, observations, medical history, etc.) over a long period.

Classic Example: The study of HM (Henry Molaison), whose severe memory loss following surgery provided crucial insight into the role of the hippocampus in memory formation.

Strengths and Limitations of Case Studies:

Strengths:
1. Rich Detail: They provide deep, meaningful, and qualitative insight that large-scale studies often miss.
2. Longitudinal: They often track changes over time.
3. Unique Phenomena: They are the only way to study rare or exceptional psychological phenomena (like specific brain damage or childhood trauma).

Limitations:
1. Lack of Generalisation: Since the sample size is usually one person, the findings cannot reliably be applied to the wider population.
2. Researcher Bias: Because the researcher often spends a lot of time with the participant, there is a risk of losing objectivity.

Quick Review: Content analysis converts data; Case Studies explore deeply.

Part 2: Ensuring Quality Research: Reliability and Validity

We need to know if our research is good quality. This is checked by two key concepts: Reliability (consistency) and Validity (accuracy).

2.1 Reliability

Reliability refers to how consistent a measure or study is. If we repeat the study or measure the same thing again, would we get the same results?

Memory Aid: If you can Repeat it and get the same result, it has Reliability.

Ways of Assessing Reliability:

1. Test-retest reliability:
This assesses the consistency of a psychological test (like a questionnaire or IQ test). The same participants take the test twice (e.g., two weeks apart). If the scores are similar, the test is reliable.

2. Inter-observer reliability (or Inter-rater reliability):
This assesses the consistency of observations. If two (or more) researchers are observing the same behaviour, their observations and interpretation of the behavioural categories should match. If their results are highly correlated (usually \(+0.80\) or more), the observation is reliable.

2.2 Validity

Validity refers to whether the study measures what it claims to measure. Are the results genuine?

Analogy: A thermometer that always reads 2 degrees too high is reliable (consistent) but not valid (inaccurate).

Types of Validity:

1. Face Validity:
The most basic measure. Does the test or measure appear, on the surface, to measure what it is supposed to? Example: A test designed to measure anxiety should contain questions clearly related to anxiety symptoms.

2. Concurrent Validity:
This is assessed by comparing a new test or measure with an existing, well-established test that measures the same thing. If scores on the new test correlate positively with scores on the old, validated test, it has good concurrent validity.

3. Predictive Validity:
Does the test accurately predict future behaviour or performance? Example: A high score on a university aptitude test should accurately predict better academic performance later on.

4. Ecological Validity:
Does the study reflect real-life behaviour? A study conducted in a highly artificial lab setting usually has low ecological validity.

The Use of Correlation in Assessment of Validity:

We use correlation to establish both concurrent and predictive validity. We are looking for a significant positive correlation between our new measure and the existing criterion. A strong correlation (close to \(+1.00\)) suggests the new measure is measuring the same construct as the validated criterion.

Common Mistake to Avoid: Don't confuse reliability (consistency) with validity (accuracy). A consistent (reliable) result can still be totally wrong (invalid)!

Part 3: Designing and Reporting Psychological Investigations

Once research is complete, psychologists write a formal scientific report so others can scrutinise and replicate their findings. These reports always follow a standardised structure.

Sections of a Scientific Report:

1. Abstract:
A brief summary (usually 150-250 words) covering the aim, method, key results, and conclusion. It allows readers to quickly decide if the full report is relevant to them.

2. Introduction:
Starts broad and gets specific. It includes the background theory and previous research (literature review), leading logically to the specific aim and hypotheses of the current study.

3. Method:
This section is detailed enough for replication. It describes:
(a) Design: Experimental design (e.g., repeated measures), variables, controls.
(b) Participants: Sample size, sampling technique, demographics (age, gender, location).
(c) Procedure: Step-by-step description of what happened, including debriefing and ethical considerations.

4. Results:
Presents the findings, both descriptive statistics (tables, graphs, measures of central tendency) and the outcome of the inferential statistical test, including the calculated value, the critical value, and the significance level.

5. Discussion:
The meaning of the results. It interprets the findings in relation to the hypotheses and previous research. It discusses limitations of the study and suggests practical applications and future research directions.

6. Referencing:
A list of all sources (books, journals, websites) cited in the report, allowing the reader to locate the information. This prevents plagiarism and shows academic professionalism. (APA format is often used).

Part 4: Data Handling and Analysis – Inferential Testing

This is where we move beyond just describing data (descriptive statistics like the mean) and start asking: "Is this result important, or did it happen purely by chance?" This is the purpose of statistical testing (inferential testing).

4.1 Probability, Significance, and Errors

When we run an experiment, we hope to reject the null hypothesis (that there is no effect) and accept the alternative hypothesis (that there is an effect).

Probability and Significance

In psychology, we calculate the probability (P) that the difference we found occurred randomly. We typically set the significance level at p ≤ 0.05 (or 5%).

What p ≤ 0.05 means: There is only a 5% chance (or less) that the results occurred by chance, and a 95% chance (or more) that the results are genuinely due to the manipulation of the Independent Variable (IV). If the calculated probability is smaller than or equal to 0.05, the result is deemed statistically significant.

Statistical Tables and Critical Values

When you run a statistical test, you get a calculated value. You compare this value to a critical value found in a statistical table.

The critical value acts as a boundary line. To decide if your result is significant, you must check the critical value based on three things:
1. The significance level (usually 0.05).
2. The number of participants/degrees of freedom (N).
3. Whether the hypothesis is one-tailed (directional) or two-tailed (non-directional).

The Rule: For many tests, the calculated value must be equal to or greater than the critical value. For a few tests (like the Sign test and Wilcoxon), the calculated value must be equal to or less than the critical value. (You must check the specific test rules!)

Type I and Type II Errors

Because we rely on probability (and never 100% certainty), we risk making mistakes when interpreting our significance level.

1. Type I Error (False Positive):
This occurs if we reject the null hypothesis and accept the alternative, when in reality, the null hypothesis was actually true. We concluded there was an effect when there wasn't.
Memory Aid: Type I = "I thought I found something!" (False Alarm). Occurs when the significance level (p) is too high (e.g., p≤0.10).

2. Type II Error (False Negative):
This occurs if we accept the null hypothesis, when in reality, the alternative hypothesis was actually true. We concluded there was no effect, but we missed a genuine effect.
Memory Aid: Type II = "Too late, I missed the real effect." Occurs when the significance level (p) is too low (e.g., p≤0.01).

4.2 Levels of Measurement

Before choosing a statistical test, you must determine the level of measurement (or type of data) collected.

1. Nominal Data:
Data represented by separate categories or counts (names/labels). There is no order or ranking.
Example: Counting how many people prefer coffee (Category 1) versus tea (Category 2).

2. Ordinal Data:
Data that can be placed in order or ranked, but the intervals between the ranks are unequal or unknown.
Example: Rating satisfaction on a scale of 1 to 10. The difference between 8 and 9 might not be the same as the difference between 2 and 3.

3. Interval Data:
Data measured using units of equal intervals (like time or temperature). This is the most precise form of measurement in psychology, as the numerical units are standardised.
Example: Reaction time measured in seconds (a 1-second difference is the same everywhere on the scale).

Factors Affecting the Choice of Statistical Test:

To choose the correct test, you must consider three things:
1. Research Aim: Is the study looking for a difference between groups, or a relationship (correlation) between variables?
2. Experimental Design: If looking for a difference, is the design related (repeated measures or matched pairs) or unrelated (independent groups)?
3. Level of Measurement: Is the data nominal, ordinal, or interval?

4.3 When to Use Named Statistical Tests

You need to know when to use the following tests based on the criteria above (Aim, Design, Data Level).

Tests for Correlation (Relationships):

1. Spearman's rho (ρ): Used for checking the relationship between two variables when the data is Ordinal.
2. Pearson's r: Used for checking the relationship between two variables when the data is Interval.

Tests for Difference (Experiments):

A. Nominal Data:
3. Chi-squared test (χ²): Used when the data is Nominal and the design is either Related (using McNemar's version) or, more commonly examined, Unrelated (Independent Groups).
4. Sign test: Used for Nominal data and a Related design (repeated measures), specifically where the data only measures positive or negative differences.

B. Ordinal Data:
5. Wilcoxon test: Used for Ordinal data and a Related design.
6. Mann-Whitney test: Used for Ordinal data and an Unrelated design.

C. Interval Data (Parametric Tests):
7. Related t-test: Used for Interval data and a Related design.
8. Unrelated t-test: Used for Interval data and an Unrelated design.

Quick Selection Checklist (The 'Carpet Test' Trick)

To remember the flow of tests, think about the data levels:

1. Nominal (Categories): Chi-squared / Sign Test
2. Ordinal (Ranks): Spearman / Wilcoxon / Mann-Whitney
3. Interval (Scores): Pearson / T-tests

Did you know? The term 'Chi-squared' comes from the Greek letter chi (\(\chi\)). Don't worry, you only need to know when to use them, not how to calculate them!

Key Takeaway: Inferential testing determines significance (usually p≤0.05). If your calculated value passes the critical value threshold (look up the rule for that specific test!), you reject the null hypothesis. The choice of test is critical and depends entirely on your data type and experimental structure.