Central Tendency - Mean, Median and Mode

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

Central Tendency and Its Types

Central tendency is a statistical measure that identifies a single value as representative of an entire dataset. This value attempts to describe a set of data by identifying the central position within that set of data. It is a way to summarize the dataset with a single value that typifies the central position of the data. Central tendency is essential in descriptive statistics because it provides an overview of the data's general characteristics and helps compare different datasets.

Types of Central Tendency: There are three main measures of central tendency: mean, median, and mode.

Mean

The mean, also known as the average, is a measure of central tendency that summarizes a set of values by indicating the central point within that set. It is one of the most common and widely used statistical measures. The mean provides a single value that is representative of a dataset, making it easier to understand and compare data. It is particularly useful when dealing with quantitative data where the values can be summed.

Formula for Ungrouped Data

The mean is calculated by summing all the values in a dataset and then dividing by the number of values. Mathematically, the mean of a dataset with $n$ values $x_1, x_2, \ldots, x_n$ is given by:
$\mu = \frac{1}{n} \sum_{i=1}^{n} x_i$

Where:
$\mu$ = mean
$n$ = number of values in the dataset
$x_i$ = each individual value in the dataset

Example Calculation

Consider the following dataset: 3, 7, 8, 5, 10.
To find the mean, we follow these steps:

1. Sum all the values:
$3 + 7 + 8 + 5 + 10 = 33$

2. Divide by the number of values ( $n = 5$ ):
$\mu = \frac{33}{5} = 6.6$

Therefore, the mean of the dataset is 6.6.

Formula for Grouped Data

When dealing with grouped data, where data is presented in the form of frequency distributions, the mean is calculated using the formula:

$\mu = \frac{\sum f_i x_i}{\sum f_i}$

Where:
$\mu$ = mean
$f_i$ = frequency of the ith class interval
$x_i$ = midpoint of the ith class interval
$\sum f_i$ = total frequency

Step Deviation Method

The step deviation method simplifies calculations by using assumed mean and step deviations. The formula is:

$\mu = A + \frac{\sum f_i d_i}{\sum f_i} \times c$

Where:
$A$ = assumed mean
$d_i$ = deviation of the ith class from the assumed mean
$c$ = class interval width

Example Calculation:

Suppose we have the following frequency distribution representing the heights (in inches) of students in a class:

Height Range (inches)	Frequency
50-55	6
55-60	12
60-65	18
65-70	10
70-75	4

Calculate the mean height of the students in the class.

Solution:

To calculate the mean height, we'll use the formula for the mean of grouped data:
$\mu = \frac{\sum f_i x_i}{\sum f_i}$

First, we need to find the midpoint of each class interval:

Height Range (inches)	Midpoint ( $x_i$ )	Frequency ( $f_i$ )
50-55	52.5	6
55-60	57.5	12
60-65	62.5	18
65-70	67.5	10
70-75	72.5	4

Next, we calculate the deviation ( $d_i$ ) for each class interval using the assumed mean (let's assume $A$ is 62.5):

Height Range (inches)	Midpoint ( $x_i$ )	Frequency ( $f_i$ )	Step Deviation ( $d_i$ )
50-55	52.5	6	-10
55-60	57.5	12	-5
60-65	62.5	18	0
65-70	67.5	10	5
70-75	72.5	4	10

Now, we calculate $\sum f_i d_i$ :

$\sum f_i d_i = (6 \times -10) + (12 \times -5) + (18 \times 0) + (10 \times 5) + (4 \times 10) = -60 - 60 + 0 + 50 + 40 = -30$

Finally, we calculate the mean ( $\mu$ ):
$\mu = A + \frac{\sum f_i d_i}{\sum f_i} = 62.5 + \frac{-30}{50} = 62.5 - 0.6 = 61.9$

Therefore, the mean height of the students in the class is approximately 61.9 inches.

Properties of the Mean

1. Uniqueness: The mean is unique for a given dataset. No other single value will produce the same balance point.

2. Sensitivity to Outliers: The mean is sensitive to extreme values (outliers). A very large or very small value can significantly affect the mean, making it less representative of the dataset.

3. Mathematical Simplicity: The mean is mathematically simple and easy to compute, making it a convenient measure for summarizing data.

4. Algebraic Properties: The mean has useful algebraic properties, such as linearity. For example, the mean of the sum of two datasets is equal to the sum of their means.

5. Applicability to Quantitative Data: The mean is applicable to quantitative data where arithmetic operations are meaningful. It is not suitable for categorical data.

Median

The median is a measure of central tendency that identifies the middle value in a dataset when the values are arranged in ascending or descending order. Unlike the mean, the median is not affected by extreme values (outliers), making it a better measure of central tendency for skewed distributions. The median provides a single value that separates the higher half from the lower half of the dataset.

Formula

The method for finding the median depends on whether the dataset has an odd or even number of values:

For a dataset with $n$ values, where the values are ordered from smallest to largest:

1. If $n$ is odd, the median is the middle value:
$\text{Median} = x_{\frac{n+1}{2}}$

2. If $n$ is even, the median is the average of the two middle values:
$\text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}}{2}$

Where:
$x_i$ = each individual value in the ordered dataset
$n$ = number of values in the dataset

Example Calculation (Ungrouped Data)

Consider the following dataset: 3, 7, 8, 5, 10.

1. Order the values: 3, 5, 7, 8, 10

2. Since the number of values ( $n = 5$ ) is odd, the median is the middle value:
$\text{Median} = x_{\frac{5+1}{2}} = x_3 = 7$

Therefore, the median of the dataset is 7.

Consider another dataset: 3, 7, 8, 5, 10, 12.

1. Order the values: 3, 5, 7, 8, 10, 12

2. Since the number of values ( $n = 6$ ) is even, the median is the average of the two middle values:
$\text{Median} = \frac{x_3 + x_4}{2} = \frac{7 + 8}{2} = 7.5$

Therefore, the median of the dataset is 7.5.

Example Calculation (Grouped Data)

Given the following frequency distribution for a dataset:

Class Interval	Frequency	Cumulative Frequency
10-20	5	5
20-30	8	13
30-40	12	25
40-50	7	32

1. Identify the median class (the class interval that contains the median value). The cumulative frequency before the median class is calculated:

Total number of observations (N) = 5 + 8 + 12 + 7 = 32

Median class is the one where the cumulative frequency just exceeds $N/2$ :

$N/2 = 32/2 = 16$

The cumulative frequencies are: 5, 13, 25, 32. The median class is 30-40.

2. Using the formula for median for grouped data:

Median = $L + \frac{\frac{N}{2} - F}{f} \times w$

Where:

$L$ = Lower boundary of the median class = 30

$N$ = Total number of observations = 32

$F$ = Cumulative frequency of the class before the median class = 13

$f$ = Frequency of the median class = 12

$w$ = Width of the median class = 10

3. Calculation:

Median = $30 + \frac{16 - 13}{12} \times 10$

Median = $30 + \frac{3}{12} \times 10$

Median = $30 + \frac{3}{4} \times 10$

Median = $30 + 7.5$

Median = 37.5

Therefore, the median of the grouped data is 37.5.

Properties of the Median

1. Unaffected by Outliers: The median is not influenced by extreme values, making it a robust measure of central tendency for skewed distributions.

2. Uniqueness: The median is unique for a given dataset. No other single value will produce the same middle point.

3. Simplicity: The median is simple to understand and calculate, especially for small datasets.

4. Applicability to Ordinal Data: The median can be used with ordinal data, where values can be ordered, but arithmetic operations are not meaningful.

5. Positional Measure: The median is a positional measure, which means it is determined by the position of values in the ordered dataset rather than their magnitude.

Mode

The mode is a measure of central tendency that identifies the most frequently occurring value in a dataset. Unlike the mean and median, the mode can be used with nominal data, where values are categories rather than numbers. The mode is particularly useful for understanding the most common value in a dataset and is not influenced by extreme values (outliers).

Formula

The mode does not have a specific formula but is determined by identifying the value(s) that occur most frequently in the dataset.

Example Calculation (Ungrouped Data)

Consider the following dataset: 3, 7, 8, 5, 10, 8.

1. Identify the frequency of each value:

3: 1 time
5: 1 time
7: 1 time
8: 2 times
10: 1 time

2. The mode is the value with the highest frequency:

$\text{Mode} = 8$

Therefore, the mode of the dataset is 8.

Consider another dataset: 3, 7, 8, 5, 10, 8, 5.

1. Identify the frequency of each value:

3: 1 time
5: 2 times
7: 1 time
8: 2 times
10: 1 time

2. Since there are two values with the highest frequency (5 and 8), this dataset is bimodal:

$\text{Mode} = 5 \text{ and } 8$

Therefore, the modes of the dataset are 5 and 8.

Example Calculation (Grouped Data)

Given the following frequency distribution for a dataset:

Class Interval	Frequency	Cumulative Frequency
10-20	5	5
20-30	8	13
30-40	12	25
40-50	7	32

1. Identify the modal class (the class interval with the highest frequency):

The highest frequency is 12, so the modal class is 30-40.

2. Using the formula for mode for grouped data:

Mode = $L + \frac{(f_1 - f_0)}{(2f_1 - f_0 - f_2)} \times w$

Where:

$L$ = Lower boundary of the modal class = 30

$f_1$ = Frequency of the modal class = 12

$f_0$ = Frequency of the class before the modal class = 8

$f_2$ = Frequency of the class after the modal class = 7

$w$ = Width of the modal class = 10

3. Calculation:

Mode = $30 + \frac{(12 - 8)}{(2 \times 12 - 8 - 7)} \times 10$

Mode = $30 + \frac{4}{24 - 15} \times 10$

Mode = $30 + \frac{4}{9} \times 10$

Mode = $30 + \frac{40}{9}$

Mode = $30 + 4.44$

Mode = 34.44

Therefore, the mode of the grouped data is 34.44.

Properties of the Mode

1. Unaffected by Outliers: The mode is not influenced by extreme values, making it a robust measure of central tendency for skewed distributions.

2. Simplicity: The mode is simple to understand and calculate, especially for nominal data.

3. Applicability to Categorical Data: The mode is the only measure of central tendency that can be used with nominal data, where values are categories rather than numbers.

4. Multiple Modes: A dataset can have more than one mode (bimodal or multimodal) if multiple values have the highest frequency.

5. Not Always Unique: Unlike the median, the mode may not be unique, especially in datasets where multiple values have the same highest frequency.