📊 Welcome to the World of Histograms!

Hello mathematicians! This chapter is all about a special way to display data called a Histogram. If you’ve already studied bar charts, you might think they look similar, but a histogram has one crucial difference that makes it essential for handling large amounts of continuous data.

Don't worry if this seems tricky at first. The key concept is simple: Area. Once you understand that the area of the bar tells you the frequency, everything clicks into place!

Key Takeaway from the Introduction

You will learn to draw and interpret histograms where the area of the bars, not just the height, represents the frequency. This is vital when dealing with continuous data grouped into classes of unequal widths.


1. Bar Charts vs. Histograms: Why the Difference?

Before diving into histograms, let's quickly remind ourselves about the types of data we use:

  • Discrete Data: Data that can only take specific, fixed values (e.g., number of siblings, shoe size).
    Bar charts are perfect for discrete data.
  • Continuous Data: Data that can take any value within a specific range (e.g., height, time, weight). Continuous data is usually grouped into classes.

What is a Histogram?

A Histogram is a statistical diagram used to represent grouped continuous data.

While a bar chart has spaces between bars (representing distinct categories or discrete values), a histogram has no gaps between the bars, showing that the data is continuous.

The Key Difference: Unequal Class Widths

When we group continuous data, the interval sizes (the width of the group) might be different. These interval sizes are called the Class Width.

If the class widths are all the same, a regular bar chart would work fine. But often, in real life (and in exam questions!), the class widths are unequal. This is where the core rule of the histogram comes in:

The Golden Rule:
In a histogram, the Area of the bar is proportional to the Frequency of that class.

Imagine measuring the time spent studying (continuous data). One group is 0-5 hours (width 5), and another is 5-25 hours (width 20). If the height of the bars represented frequency, the 5-25 hour group would look huge and misleading, even if fewer people were in it! We use area to keep the representation fair.


2. The Calculation: Frequency Density

Since the area must represent the frequency, we need a special calculation for the vertical axis. We cannot simply plot frequency on the y-axis, because that would make wider bars look much too important.

Introducing Frequency Density (FD)

The vertical axis of a histogram is called the Frequency Density (FD).

FD is the measure that ensures the area of each rectangular bar correctly represents the frequency of that class.

The Frequency Density Formula:
The area of a bar (Frequency) is calculated by:
\( \text{Area} = \text{Class Width} \times \text{Height} \)

Therefore, the Height (Frequency Density) is found using:
\( \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} \)

Step-by-Step: Calculating FD from a Frequency Table

To draw a histogram, your first task is always to calculate the FD for every class interval.

Example Setup: Students' heights (cm).

Class Interval (Height, \( h \)) Frequency (F) 1. Calculate Class Width (W) 2. Calculate Frequency Density (FD)
\( 150 < h \le 160 \) 10 \( 160 - 150 = 10 \) \( \text{FD} = 10 / 10 = 1 \)
\( 160 < h \le 175 \) 30 \( 175 - 160 = 15 \) \( \text{FD} = 30 / 15 = 2 \)
\( 175 < h \le 180 \) 25 \( 180 - 175 = 5 \) \( \text{FD} = 25 / 5 = 5 \)
💡 Quick Review Box: Why FD matters here

Notice the last interval (\( 175 < h \le 180 \)) has a much smaller width (5) but a higher frequency density (5). This ensures its area ( \( 5 \times 5 = 25 \) ) is correctly represented, compared to the second interval, which is wide (15) but flatter (FD=2), giving an area of \( 15 \times 2 = 30 \).


3. Drawing a Histogram

Drawing the histogram accurately requires setting up your axes correctly based on your continuous data and the calculated Frequency Density.

Step-by-Step Drawing Guide

  1. Determine Class Boundaries (The x-axis):

    For continuous data, ensure there are no gaps between the bars. The class boundaries are crucial for determining the Class Width (W). If a table says 10-19 and 20-29, the boundary gap needs closing (making the classes 9.5 to 19.5, and 19.5 to 29.5).
    Tip: If the classes are already defined mathematically (like \( 150 < h \le 160 \)), the boundaries are simply 150 and 160.

  2. Set up the Axes:
    • Horizontal Axis (x-axis): This must be labeled with the variable (e.g., Height, Time) and marked using the class boundaries (150, 160, 175, 180, etc.). Ensure your scale handles the potentially unequal widths.
    • Vertical Axis (y-axis): This must be labeled Frequency Density. The scale should accommodate the highest FD value you calculated.
  3. Draw the Bars:

    For each class:

    • The width of the bar spans the Class Width on the x-axis.
    • The height of the bar corresponds exactly to the calculated Frequency Density for that class.
⚠️ Common Mistake Alert: Class Boundaries

If your data is rounded (e.g., ages are integers: 10-14, 15-19), remember to find the precise boundaries (the midpoint between classes). The boundary between 14 and 15 is 14.5. The true first class interval is therefore \( 9.5 \le \text{Age} < 14.5 \).


4. Interpreting a Histogram: Finding Frequencies

The most common exam question requires you to work backwards: using the graph (Area) to find the frequency (the number of items in that group).

Remember the fundamental relationship:

\( \mathbf{\text{Frequency} = \text{Class Width} \times \text{Frequency Density}} \)

Case 1: Finding the Frequency of a Whole Bar

This is straightforward. Identify the bar you need, read its width from the x-axis, read its height (FD) from the y-axis, and multiply them together.

Example: A bar spans from 20 to 35 (Width = 15). The height (FD) is 4.
\( \text{Frequency} = 15 \times 4 = 60 \).

Case 2: Finding the Frequency of a Partial Bar

Sometimes, a question asks for the frequency within a range that only covers a *section* of a drawn bar. You must only calculate the area of that specific section.

Step-by-Step: Finding Partial Frequency
  1. Identify the relevant FD: Read the height (FD) of the bar containing the required section.
  2. Calculate the required Partial Width: Determine the width of the specific portion you are interested in.
  3. Calculate Frequency: Multiply the FD by the Partial Width.

Example: A bar runs from 10 to 30 (FD = 2). The question asks for the frequency of data points between 25 and 30.

  • Relevant FD = 2.
  • Partial Width = \( 30 - 25 = 5 \).
  • Frequency = \( 5 \times 2 = 10 \).

Case 3: Using Frequency to Find an Unknown FD (Scaling)

If the total frequency is given for a set of bars, but the scale on the FD axis is missing, you can use the total known area to find the missing scale factor.

The total area of the histogram must equal the total frequency.

If the graph shows Area X, but you know the total Frequency is Y, then the scale factor needed for the FD axis is \( \frac{\text{Actual Total Frequency (Y)}}{\text{Total Calculated Area based on current scale (X)}} \). You then apply this factor to the y-axis scale.


Summary and Quick Check

You are now ready to tackle histograms! Remember these key points:

  • Histograms are for grouped continuous data.
  • Area = Frequency. (This is the most important rule!)
  • The vertical axis is Frequency Density.
  • Formula: \( \text{FD} = \frac{F}{W} \) (where F = Frequency, W = Class Width).
  • When reading off a histogram, use \( F = W \times FD \).

Did you know? The term "histogram" was first used by Karl Pearson in 1895. He derived the name from the Greek words histos (anything set upright, like a bar) and gramma (drawing or record).