What is statistics? – Mean, Median, and Mode for grouped data

Statistical Definitions
True Upper and Lower Range Limits
The True Upper and Lower limits for a Range, when these Range Limits are whole numbers, they can be calculated in the following way:

i) “Half point is subtracted from the Lower Range Limit
ii) “Half point is added to the Upper Range Limit

Therefore, the Age Range of 21-30 has True Range Limits: 20.5-30.5.

What is Mode – A Historical view
Karl Pearson was one of the most influential, famous and great statistician that existed. He founded the world first Statistical University department in London. He was honored multiple times from UK government. He also invented the Mode among other things (1895).

The Modal Group / Range
The Modal Range is this one that includes the Mode. Therefore, the Range / Group that has the highest Simple Frequency, is this one, the Modal Group or Modal Range.

What is Mode – Formula and Data Example
The below table shows the Frequencies per Age Group / Range. For example, the frequency of the range “21 to 30″ is one because only one participant claimed to have an age from 21 to 30. Note that the Age Range with the highest frequency value is the Age Range of: 51-60, with a frequency of 15.

Grouped Frequency for AgeFrequency Cumulative FrequencyMiddle point of Age RangesFrequency (f) multiplied by the Middle point of Age Ranges
21 - 301125.525.5
31- 401010+1=1135.5355
41- 50511+5=1645.5227.5
51- 601516+15=3155.5832.5

Statistical definition of Mode / Modal value for Grouped data
When the frequency of values or categories is grouped, then a different formula is applied for calculating Mode / Modal value:

Formula’s Symbols Explanation
M=(Estimated) Mode / Modal value
L=The Lowest value of the Modal Range: L=50.5
f_m=Frequency of the Modal Range: f_m=15
f_{m-1}=The frequency of the Range that is exactly before the Modal Range: f_{m-1}=5
f_{m+1}= The frequency of the Range that is exactly next to / after the Modal Range: f_{m+1}=0. It is equal to Zero (0) because no such Age Range exists (61-70).
w= The size of the Modal Range is: w=10

The size of the Range for Integer “represantation” can be calculated by subtracting the “Upper Range Limit” – “Lower Range Limit” + 1, therefore: 60-51=9+1=10

By replacing the formula symbols by the values of the example, in order to calculate the Mode value for grouped data, we can get the following:
M=50.5+\frac{15-5}{(15-5)+(15-0)}*5=50.5+\frac{10}{(10)+(15)}*5, so:

Therefore, in this example, the Mode / Modal value of the age is 52.5.


Comparing the Mean, Median, and Mode
In order to present a clear understanding of the Mean, Median, and Mode which are measurements of the Central Tendency, the Mean and Median values for Grouped data must also be calculated.

The Median value for grouped data
The statistical formula for the calculation of the Median value for grouped data is: M=L+\frac{(n/2-cf_{b})}{f_{m}}*w

In order to find what is the value of Median for grouped data, the total or Cumulative Frequency cf of the Age data must be calculated which is the summation of the Simple Frequency of each Range: 1+10+5+15=31.

This result must be divided by 2. This number will indicate the Age Range that Median value can exist: 31/2=15.5. Based on this number, we are searching to find this Cumulative Frequency that is “nearest” to this number.

The Cumulative Frequency for the Age Range of 31- 40 is 11 while the Cumulative Frequency for the Age Range of 41-50 is 16. This number is “nearest” to the previous result (15.5). Therefore, the Median value is more possible to exist in the Age Range of 41-50.


Formula’s Symbols Explanation
M = (Estimated) Median value
f = Frequency
L = The True Lower Limit of the Age Range that includes the Median: L=40.5
f_m = The Frequency of the Age Range that includes the Median: f_m=5
n = The Total number of obervations. This value can be found by adding the Simple frequencies of all Age Ranges together: n=1+10+5+15=31=>n=31
cf_{b} =The value of the Cumulative Frequency that exists immediately before the Age Range that includes the Median: f_{m}=1+10=11
w = The size of the Range that includes the Median. It is calculated by subtracting the Upper Range Limit from the Lower Range limit and adding one point, therefore: w=10

By replacing the formula symbols by the actual numbers of this example, we can get the following result:
M=40.5+\frac{((31/2)-11)}{5}*10 and thus:

The value of the Median for these Grouped data is 48.6

The Mean value for Grouped data
In order to find the Mean Value, we must find find the middle value of each Age Range. This is calculated easily by the following way: We add the values of Upper and Lower Range Limits and then we divide this result by two (2). This will produce the average value of them. In each case is:

i) 21+30=51/2=25.5.
ii) 31+40=71/2=35.5.
iii) 41+50=91/2=45.5.
iii) 51+60=111/2=55.5.

Based on these results, it is arbitrarily suggested -the true record is missing- that all participants that were categorized under the Age Range of 21-30, they could have an age of 25.5. The same can be arbitrarily suggested for All current Age Ranges e.g. All participants had an age of 35.5, 45.5 or 55.5, according to the category that were grouped under. Therefore, now, we have 31 “real” Age numbers by using this estimation method.


The next step is to multiply the Frequency f by the middle value of each age range that we have found:


By adding these numbers together, we have as a result:

Now, we must divide this result by the total number of Observations, that is, by the size of the sample, which is 31. Then we will have the Arithmetic Mean for Grouped data: 1440.5/31=46.467=> 46.5

By summarizing our results for grouped data, we have found:
The Arithmetic Mean which is 46.5,
The Median value which is 48.6, and
The Mode / modal value which is 52.5.

mean-median-mode-grouped data

These results show that:
i) The mean age of the sample is 46.5
ii) The 50% of the sample has an Age lower than 48.6 years and the rest 50% of the sample has an age higher than 48.6 years.
iii) In the same time, the age that we can “suggest” that was more frequent reported, if data were not grouped, it is the age of 52.5 .

These results show that our data lack normality which we will discuss in another chapter what normality is. For convenient reasons, it must be said that, if our data had normality, then the Arithmetic Mean (Mn), the Median value (Me), and the Mode / Modal value (Mo) will had exactly the same value: Mn=Me=Mo. However, Mean, Median, and Mode have values that are close one to another.

Statistical relation of the Mean, Median, and Mode value.
The statistical relation of these three measurements of the Central Tendency was first described by Karl Pearson for datasets that slightly deviate from normality. This is true for the current example. Mean, Median, and Mode have values that are close one to another. Based on these formulas, any two of these three measurements of Central Tendency is known, we can calculate the third one. This relation is given by the following formulas:

i) Mean: Mn=\frac{3}{2}Me-\frac{1}{2}*Mo
ii) Mode: Mo=3*Me-2*Mn
iii) Median: Me=\frac{1}{3}*Mo+\frac{2}{3}*Mn

We will replace in each statistical formula, the relevant values that we have found for the three measurements of Central Tendency:

i) Mn=\frac{3}{2}Me-\frac{1}{2}*Mo=\frac{3}{2}*48.6-\frac{1}{2}*52.5=72.9-26.25=46.65
ii) Mo=3*Me-2*Mn=3*48.6-2*46.5=145.8-93=52.8
iii) Me=\frac{1}{3}*Mo+\frac{2}{3}*Mn=\frac{1}{3}*52.5+\frac{2}{3}*46.5=17.5+31=48.5

The original Mean, Mode, and Median were 46.5, 52.5, and 48.6, while now we have found 46.65, 52.8, and 48.6. A very slight deviation from the original values. Therefore, these formulas are useful, however, they must be used with caution.

Karl Pearson: SAS blog