Mean, median, mode for grouped data — when to use each formula

medium CBSE-10 4 min read

Question

For grouped (frequency distribution) data, how do we calculate mean, median, and mode, and when should we use each measure?

Solution — Step by Step

Three methods, all giving the same answer:

Direct method:

xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

where xix_i = class mark (midpoint) and fif_i = frequency.

Assumed mean method (faster for large numbers):

xˉ=a+fidifi\bar{x} = a + \frac{\sum f_i d_i}{\sum f_i}

where aa = assumed mean, di=xiad_i = x_i - a.

Step deviation method (fastest):

xˉ=a+h×fiuifi\bar{x} = a + h \times \frac{\sum f_i u_i}{\sum f_i}

where hh = class width, ui=(xia)/hu_i = (x_i - a)/h.

Use the step deviation method when class sizes are equal and numbers are large.

First, find the median class — the class interval where the cumulative frequency first exceeds N/2N/2 (where N=fiN = \sum f_i).

Median=l+(N/2cff)×h\text{Median} = l + \left(\frac{N/2 - cf}{f}\right) \times h

where:

  • ll = lower limit of median class
  • cfcf = cumulative frequency of the class before the median class
  • ff = frequency of the median class
  • hh = class width

The modal class is the class with the highest frequency.

Mode=l+(f1f02f1f0f2)×h\text{Mode} = l + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h

where:

  • ll = lower limit of modal class
  • f1f_1 = frequency of modal class
  • f0f_0 = frequency of the class before the modal class
  • f2f_2 = frequency of the class after the modal class
  • hh = class width
MeasureBest whenLimitation
MeanData is symmetric, no outliersAffected by extreme values
MedianData is skewed or has outliersIgnores actual values, only uses position
ModeYou need the most frequent valueMay not exist or may be multiple

Empirical relationship (approximate):

Mode3×Median2×Mean\text{Mode} \approx 3 \times \text{Median} - 2 \times \text{Mean}
flowchart TD
    A["Grouped Data: Which measure?"] --> B{"What does the question ask?"}
    B -->|"Average value"| C["Mean: sum fi xi / sum fi"]
    B -->|"Middle value"| D["Median: find median class, use formula"]
    B -->|"Most frequent value"| E["Mode: find modal class, use formula"]
    C --> F{"Large numbers?"}
    F -->|"Yes"| G["Use step deviation method"]
    F -->|"No"| H["Use direct method"]
    D --> I["Key: find cf just before N/2"]
    E --> J["Key: identify highest frequency class"]

Why This Works

For grouped data, we do not know individual values — only class intervals and their frequencies. The formulas use interpolation within the relevant class to estimate where the mean, median, or mode falls.

The median formula assumes data is uniformly distributed within each class interval. The mode formula uses the frequencies of neighbouring classes to estimate the peak of the distribution within the modal class.

Alternative Method

For a quick check, use the empirical relationship: Mode = 3(Median) - 2(Mean). If your calculated values roughly satisfy this, your answers are likely correct. This is especially useful in exams when you have time to verify only one of the three values.

Common Mistake

In the median formula, students use the cumulative frequency of the median class instead of the class before it. The variable cfcf in the formula is the cumulative frequency up to (but not including) the median class. Using the wrong cfcf shifts the answer by an entire class width. This is the single most common error in CBSE 10th statistics questions — check your cumulative frequency column carefully.

Want to master this topic?

Read the complete guide with more examples and exam tips.

Go to full topic guide →

Try These Next