NEOCODE
Back To Top

Unit-2: Correlation and Linear Regression

1. Scatter Plots

A scatter plot is a graphical representation of the relationship between two variables. It displays data points for two numerical variables, with one variable on the x-axis and the other on the y-axis.

2. Correlation Coefficient

The correlation coefficient measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1.

Properties of Correlation Coefficient

3. Karl Pearson’s Correlation Coefficient

Karl Pearson’s correlation coefficient (r) is a measure of the linear correlation between two variables X and Y.

r = ( xi x¯ ) ( yi y¯ ) ( xi x¯ ) 2 ( yi y¯ ) 2
r = Cov(X,Y) Var(X) Var(Y)
r = σxy σx σy
Cov(X,Y) = E(XY) E(X) E(Y)
Var(X) = E(X2) [ E(X) ] 2
Var(Y) = E(Y2) [ E(Y) ] 2
E(X) = X N
E(XY) = XY N
E(X2) = X2 N
Cov(X,Y) = E [ ( X μx ) ( Y μy ) ] = ( X X¯ ) ( Y Y¯ ) N

Where:

4. Spearman’s Rank Correlation Coefficient

Spearman’s rank correlation coefficient rs measures the strength and direction of the monotonic relationship between two variables. It is based on the ranks of the data.

Between two variables when there is no tie during rank:

rs = 1 6 di2 n ( n2 1 )

Between two variables when there is a tie during rank:

rs = 1 6 ( di2 + m ( m2 1 ) 12 ) n ( n2 1 )

Correction function:

m ( m2 1 ) 12

Where:

5. Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable Y and one or more independent variables X.

For Y on X:

Y = a + b X
( y y ) = ( r σy σx ) ( x x¯ )
byx = ( r σy σx ) = Cov(X,Y) Var(X)

For X on Y:

X = a + b Y
( x x¯ ) = ( r σx σy ) ( y y¯ )
bxy = ( r σx σy ) = Cov(X,Y) Var(Y)

Where:

Properties of Linear Regression

Angle Between Two Lines of Regression

tan θ = ( 1 r2 ) | r | σx σy σx2 + σy2

For perfect correlation:

r = ± 1
tan θ = 0 ( θ = 0 ° )

This means the lines coincide.


For no correlation:

r = 0
tan θ = ( θ = 90 ° )

This means the equations of the lines are:

y = y¯ (for Y on X)
X = X¯ (for X on Y)

If the lines are given:

tan θ = | m1 m2 1 + m1 m2 |
m1 = byx
m2 = 1 bxy

The ratio of variances is the same as the ratio of regression coefficients:

σy2 σx2 = byx bxy

Formulas for Linear Regression

Slope b:

b = ( xi x¯ ) ( yi y¯ ) ( xi x¯ ) 2

Intercept a:

a = y¯ b x¯

MCQ Questions

1. What does a correlation coefficient of 0 indicate?

Answer: c) No correlation

2. In Spearman’s rank correlation, what does di represent?

Answer: a) Difference between ranks of corresponding values