Question
For grouped (frequency distribution) data, how do we calculate mean, median, and mode, and when should we use each measure?
Solution — Step by Step
Three methods, all giving the same answer:
Direct method:
where = class mark (midpoint) and = frequency.
Assumed mean method (faster for large numbers):
where = assumed mean, .
Step deviation method (fastest):
where = class width, .
Use the step deviation method when class sizes are equal and numbers are large.
First, find the median class — the class interval where the cumulative frequency first exceeds (where ).
where:
- = lower limit of median class
- = cumulative frequency of the class before the median class
- = frequency of the median class
- = class width
The modal class is the class with the highest frequency.
where:
- = lower limit of modal class
- = frequency of modal class
- = frequency of the class before the modal class
- = frequency of the class after the modal class
- = class width
| Measure | Best when | Limitation |
|---|---|---|
| Mean | Data is symmetric, no outliers | Affected by extreme values |
| Median | Data is skewed or has outliers | Ignores actual values, only uses position |
| Mode | You need the most frequent value | May not exist or may be multiple |
Empirical relationship (approximate):
flowchart TD
A["Grouped Data: Which measure?"] --> B{"What does the question ask?"}
B -->|"Average value"| C["Mean: sum fi xi / sum fi"]
B -->|"Middle value"| D["Median: find median class, use formula"]
B -->|"Most frequent value"| E["Mode: find modal class, use formula"]
C --> F{"Large numbers?"}
F -->|"Yes"| G["Use step deviation method"]
F -->|"No"| H["Use direct method"]
D --> I["Key: find cf just before N/2"]
E --> J["Key: identify highest frequency class"]
Why This Works
For grouped data, we do not know individual values — only class intervals and their frequencies. The formulas use interpolation within the relevant class to estimate where the mean, median, or mode falls.
The median formula assumes data is uniformly distributed within each class interval. The mode formula uses the frequencies of neighbouring classes to estimate the peak of the distribution within the modal class.
Alternative Method
For a quick check, use the empirical relationship: Mode = 3(Median) - 2(Mean). If your calculated values roughly satisfy this, your answers are likely correct. This is especially useful in exams when you have time to verify only one of the three values.
Common Mistake
In the median formula, students use the cumulative frequency of the median class instead of the class before it. The variable in the formula is the cumulative frequency up to (but not including) the median class. Using the wrong shifts the answer by an entire class width. This is the single most common error in CBSE 10th statistics questions — check your cumulative frequency column carefully.