Unit-2: Correlation and Linear Regression
1. Scatter Plots
A scatter plot is a graphical representation of the relationship between two variables. It displays data points for two numerical variables, with one variable on the x-axis and the other on the y-axis.
- Purpose: To visualize the correlation or relationship between two variables.
- Interpretation:
- Positive correlation: Points trend upward.
- Negative correlation: Points trend downward.
- No correlation: Points are scattered randomly.
2. Correlation Coefficient
The correlation coefficient measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1.
- r = 1: Perfect positive correlation.
- r = -1: Perfect negative correlation.
- r = 0: No correlation.
Properties of Correlation Coefficient
- It is symmetric: .
- It is unitless and scale-invariant.
- It measures only linear relationships.
- It is sensitive to outliers.
3. Karl Pearson’s Correlation Coefficient
Karl Pearson’s correlation coefficient () is a measure of the linear correlation between two variables and .
Where:
- and are the means of and , respectively.
- and are individual data points.
4. Spearman’s Rank Correlation Coefficient
Spearman’s rank correlation coefficient measures the strength and direction of the monotonic relationship between two variables. It is based on the ranks of the data.
Where:
- is the difference between the ranks of corresponding values of and .
- is the number of observations.
- is the multiplication of repeated rank.
5. Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables .
Where:
- : Dependent variable.
- : Independent variable.
- : Y-intercept.
- : Slope of the line.
- : Regression coefficient of on .
- : Regression coefficient of on .
Properties of Linear Regression
- The point is the intersection point of the two regression lines.
- For perfect correlation, both regression lines coincide.
- The correlation coefficient is the geometric mean of the regression coefficients:
- If one regression coefficient is greater than unity, the other must be less than unity.
- Regression coefficients are independent of changes in origin but not of scale:
- The modulus value of the arithmetic mean of the regression coefficients is not less than the modulus of the correlation coefficient ():
- The regression line minimizes the sum of squared errors (SSE).
- The slope represents the change in for a unit change in .
- The intercept represents the value of when .
- The regression line passes through the point .
Angle Between Two Lines of Regression
Formulas for Linear Regression
Slope :
Intercept :
MCQ Questions
1. What does a correlation coefficient of 0 indicate?
- a) Perfect positive correlation
- b) Perfect negative correlation
- c) No correlation
- d) None of the above
Answer: c) No correlation
2. In Spearman’s rank correlation, what does represent?
- a) Difference between ranks of corresponding values
- b) Sum of ranks of corresponding values
- c) Product of ranks of corresponding values
- d) None of the above
Answer: a) Difference between ranks of corresponding values