Hypothesis Testing
Introduction to Hypothesis Testing
Statistical Hypothesis: An assumption about a population parameter which may or may not be true.
Null Hypothesis (H₀): A statement of no effect or no difference, which we test for possible rejection.
Alternative Hypothesis (H₁ or Hₐ): A statement that contradicts the null hypothesis, representing what we suspect might be true.
Key Points:
- Hypothesis testing is a method to make decisions using data
- We always test the null hypothesis
- The alternative hypothesis can be one-tailed or two-tailed
- Significance level (α) is the probability of rejecting H₀ when it's true
Types of Errors in Hypothesis Testing
Error Type |
Definition |
Probability |
Also Known As |
Type I Error |
Rejecting H₀ when it's actually true |
α (significance level) |
False positive |
Type II Error |
Failing to reject H₀ when it's false |
β |
False negative |
Power of Test (1-β): The probability of correctly rejecting H₀ when it's false.
Important Relationships:
- α and β are inversely related for a fixed sample size
- Increasing sample size decreases both α and β
- Critical region is the set of values that leads to rejecting H₀
Example: In a medical test, H₀: "Patient is healthy".
Type I error = Diagnosing a healthy patient as diseased (false positive).
Type II error = Failing to detect a diseased patient (false negative).
Z-test for Single Mean
Z-test: Used when we want to test whether the sample mean differs significantly from the population mean, given that the population variance is known and sample size is large (n ≥ 30).
Example: A company claims its light bulbs last 1000 hours. We test 50 bulbs and find x̄ = 980 hours. Population σ = 80 hours. Test at α = 0.05.
H₀: μ = 1000
H₁: μ ≠ 1000 (two-tailed)
Z = (980 - 1000)/(80/√50) = -20/11.31 ≈ -1.77
Critical Z = ±1.96
Since -1.77 > -1.96, we fail to reject H₀.
When to use Z-test:
- Population variance is known
- Sample size is large (n ≥ 30)
- Data is normally distributed or sample size is very large
Z-test for Difference of Means
Z-test for two means: Used to compare two population means when the population variances are known and samples are independent.
Example: Test if two teaching methods differ in effectiveness. Method A (n=50, x̄=78, σ=8), Method B (n=60, x̄=75, σ=7). α=0.05.
H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂
Z = (78-75)/√(8²/50 + 7²/60) = 3/√(1.28 + 0.82) ≈ 3/1.45 ≈ 2.07
Critical Z = ±1.96
Since 2.07 > 1.96, we reject H₀.
Student's t-test for Single Mean
t-test: Used when population variance is unknown and sample size is small (n < 30). Uses sample standard deviation.
Example: A car claims 20 km/liter. Test with 10 cars gives x̄=18.5, s=1.5. α=0.05.
H₀: μ = 20
H₁: μ < 20 (one-tailed)
t = (18.5-20)/(1.5/√10) = -1.5/0.474 ≈ -3.16
Critical t (df=9, α=0.05, one-tailed) ≈ -1.833
Since -3.16 < -1.833, we reject H₀.
Key Points:
- Use when σ is unknown and n < 30
- t-distribution is similar to normal but with heavier tails
- As n increases, t approaches z
- Always check degrees of freedom
t-test for Difference of Means
Independent Samples t-test
Compares means from two independent groups when variances are unknown and assumed equal.
Test Statistic:
t = (x̄₁ - x̄₂) / [sₚ√(1/n₁ + 1/n₂)]
Where pooled standard deviation sₚ = √[((n₁-1)s₁² + (n₂-1)s₂²)/(n₁+n₂-2)]
df = n₁ + n₂ - 2
Paired t-test
Used when samples are dependent (before/after measurements on same subjects).
Example (Independent): Compare two teaching methods. Method A (n=15, x̄=75, s=8), Method B (n=12, x̄=70, s=6). α=0.05.
sₚ = √[(14×64 + 11×36)/25] = √[(896+396)/25] ≈ √51.68 ≈ 7.19
t = (75-70)/[7.19√(1/15 + 1/12)] ≈ 5/(7.19×0.39) ≈ 1.79
Critical t (df=25, two-tailed) ≈ ±2.06
Fail to reject H₀.
F-test
F-test: Used to compare two population variances or in ANOVA to compare multiple means.
Test Statistic for Variance Comparison:
F = s₁²/s₂² (where s₁² > s₂²)
df₁ = n₁ - 1 (numerator)
df₂ = n₂ - 1 (denominator)
Example: Compare variability of two machines. Machine A (n=10, s=5), Machine B (n=8, s=3). α=0.05.
H₀: σ₁² = σ₂²
H₁: σ₁² ≠ σ₂²
F = 5²/3² = 25/9 ≈ 2.78
Critical F (df₁=9, df₂=7, α=0.05) ≈ 3.68
Since 2.78 < 3.68, fail to reject H₀.
Important Points:
- Always put larger variance in numerator
- F-distribution is right-skewed
- Used to check assumption of equal variances before t-test
- Fundamental to ANOVA
Chi-square Test for Goodness of Fit
Chi-square test: Tests whether observed frequencies differ significantly from expected frequencies.
Example: Test if die is fair (60 rolls: 8 ones, 12 twos, 9 threes, 11 fours, 10 fives, 10 sixes). α=0.05.
H₀: Die is fair (Eᵢ=10 for all)
H₁: Die is not fair
χ² = (8-10)²/10 + (12-10)²/10 + ... + (10-10)²/10 = 4/10 + 4/10 + 1/10 + 1/10 + 0 + 0 = 1.0
Critical χ² (df=5, α=0.05) ≈ 11.07
Since 1.0 < 11.07, fail to reject H₀.
Key Points:
- Used for categorical data
- Expected frequencies should be ≥5 for each category
- Also used for independence tests in contingency tables
- Right-tailed test only
Summary Table of Hypothesis Tests
Test |
Purpose |
Assumptions |
Test Statistic |
Z-test (single mean) |
Compare sample mean to population mean |
σ known, n ≥ 30 or normal population |
Z = (x̄ - μ₀)/(σ/√n) |
Z-test (two means) |
Compare two population means |
σ₁, σ₂ known, independent samples |
Z = (x̄₁ - x̄₂)/√(σ₁²/n₁ + σ₂²/n₂) |
t-test (single mean) |
Compare sample mean to population mean |
σ unknown, n < 30, normal population |
t = (x̄ - μ₀)/(s/√n), df=n-1 |
t-test (two means) |
Compare two population means |
σ unknown, independent samples, equal variances |
t = (x̄₁ - x̄₂)/[sₚ√(1/n₁ + 1/n₂)], df=n₁+n₂-2 |
Paired t-test |
Compare means from paired measurements |
Differences normally distributed |
t = d̄/(s_d/√n), df=n-1 |
F-test |
Compare two variances |
Normal populations, independent samples |
F = s₁²/s₂², df₁=n₁-1, df₂=n₂-1 |
Chi-square |
Goodness of fit or independence |
Categorical data, expected frequencies ≥5 |
χ² = Σ[(Oᵢ - Eᵢ)²/Eᵢ] |
Decision Making in Hypothesis Testing
p-value approach: Compare p-value (probability of observing test statistic under H₀) with α.
- If p-value ≤ α, reject H₀
- If p-value > α, fail to reject H₀
Critical value approach: Compare test statistic with critical value from distribution table.
- If test statistic falls in rejection region, reject H₀
- Otherwise, fail to reject H₀
Important Notes:
- Never "accept" H₀ - we either reject or fail to reject
- Statistical significance ≠ practical significance
- Effect size measures the magnitude of the difference
- Confidence intervals can provide more information than tests
Common Mistakes in Hypothesis Testing
- Using wrong test for given data and assumptions
- Ignoring assumptions of the test (normality, equal variance, etc.)
- Confusing statistical significance with practical importance
- Performing multiple tests without adjustment (increases Type I error)
- Stopping data collection after achieving significant result (p-hacking)
- Misinterpreting p-value as probability H₀ is true
- Using one-tailed test when two-tailed is appropriate
- Not reporting effect size along with p-value
Exam Tips for Hypothesis Testing Questions
Problem-Solving Strategy:
- Identify the type of problem (mean, proportion, variance, etc.)
- Check given information (sample size, known/unknown variance)
- Formulate H₀ and H₁ correctly
- Choose appropriate test based on conditions
- Verify test assumptions
- Calculate test statistic carefully
- Find critical value or p-value
- Make decision and state conclusion in context
Common Exam Questions:
- Identify type of error (Type I/II) in given scenario
- Interpret p-value in context
- Calculate power of test
- Determine required sample size for given power
- Choose appropriate test for given research question
- Interpret confidence interval in relation to hypothesis test
Practice Problems
Problem 1: A sample of 25 students has mean IQ of 110. Population mean is 100 with σ=15. Test if sample is smarter at α=0.01.
Solution:
Z-test (σ known)
Z = (110-100)/(15/5) = 3.33
Critical Z (one-tailed) = 2.326
Reject H₀, sample is significantly smarter.
Problem 2: Two machines produce components. Machine A (n=10, s=0.8), Machine B (n=12, s=0.5). Test if variances differ at α=0.05.
Solution:
F-test
F = (0.8)²/(0.5)² = 0.64/0.25 = 2.56
Critical F (df₁=9, df₂=11) ≈ 2.90
Fail to reject H₀, no significant difference in variances.