Correlation and Coefficient - Probability and Statistics

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

Correlation and Coefficient

Correlation refers to the statistical relationship between two variables. It measures the strength and direction of the linear relationship between them. The correlation coefficient quantifies this relationship, providing a numerical value that ranges from -1 to 1.

Formula:

The correlation coefficient (often denoted by r) is calculated using the following formula:

Pearson Correlation Coefficient (r):

r=n(xy)(x)(y)[nx2(x)2][ny2(y)2]r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

where:

  • n is the number of observations
  • x and y are the two variables
  • xy\sum xy is the sum of the products of the paired observations
  • x\sum x and y\sum y are the sums of the individual observations
  • x2\sum x^2 and y2\sum y^2 are the sums of the squares of the individual observations

Example:

Suppose we want to determine the correlation between the hours of study and exam scores of students. Below are the hours of study (x) and corresponding exam scores (y) for a sample of 5 students:

Hours of Study (x)Exam Scores (y)
375
585
790
480
688

Using the formula, we can calculate the correlation coefficient:

r=5(3×75+5×85+7×90+4×80+6×88)(3+5+7+4+6)(75+85+90+80+88)[5(32+52+72+42+62)(3+5+7+4+6)2][5(752+852+902+802+882)(75+85+90+80+88)2]r = \frac{5(3 \times 75 + 5 \times 85 + 7 \times 90 + 4 \times 80 + 6 \times 88) - (3 + 5 + 7 + 4 + 6)(75 + 85 + 90 + 80 + 88)}{\sqrt{[5(3^2 + 5^2 + 7^2 + 4^2 + 6^2) - (3 + 5 + 7 + 4 + 6)^2][5(75^2 + 85^2 + 90^2 + 80^2 + 88^2) - (75 + 85 + 90 + 80 + 88)^2]}}

After calculation, if rr is found to be positive, it indicates a positive correlation between hours of study and exam scores. If it's negative, it indicates a negative correlation. The closer rr is to 1 or -1, the stronger the correlation.

Characteristics:

The correlation coefficient ranges from -1 to 1. A value of 1 implies a perfect positive linear relationship, -1 implies a perfect negative linear relationship, and 0 implies no linear relationship.