NEOCODE

Testing of Hypothesis MCQs

Types of Error

1. In hypothesis testing, what is a Type I error?

Correct Answer: c) Rejecting a null hypothesis when it is true

Explanation:
A Type I error occurs when we incorrectly reject a true null hypothesis (H₀). It's a false positive result.

In hypothesis testing, we start with a null hypothesis (H₀) and an alternative hypothesis (H₁). The decision process can lead to four possible outcomes:
1. Correctly fail to reject H₀ when it is true (correct decision)
2. Incorrectly reject H₀ when it is true (Type I error)
3. Correctly reject H₀ when it is false (correct decision)
4. Incorrectly fail to reject H₀ when it is false (Type II error)

Type I error is considered more serious in many applications, which is why the significance level (α) is usually set to small values like 0.05 or 0.01.

2. The probability of making a Type I error is denoted by:

Correct Answer: b) α

Explanation:
The probability of making a Type I error is denoted by α (alpha), which is also known as the significance level of the test.

α = P(Reject H₀ | H₀ is true)

Important relationships:
• α (alpha) = Probability of Type I error (rejecting a true null hypothesis)
• β (beta) = Probability of Type II error (failing to reject a false null hypothesis)
• 1-β = Power of the test (probability of correctly rejecting a false null hypothesis)
• 1-α = Confidence level (probability of correctly failing to reject a true null hypothesis)

In practice, we typically set α in advance (e.g., 0.05) to control the risk of making a Type I error.

3. A Type II error occurs when:

Correct Answer: d) We incorrectly fail to reject the null hypothesis

Explanation:
A Type II error occurs when we fail to reject a null hypothesis (H₀) that is actually false. It's a false negative result.

In terms of probability, Type II error is represented as:
β = P(Fail to reject H₀ | H₀ is false)

To reduce Type II errors, we can:
1. Increase the sample size
2. Use a larger significance level (α), though this increases the risk of Type I errors
3. Choose a more efficient test statistic
4. Ensure proper experimental design

There's always a trade-off between Type I and Type II errors - decreasing one typically increases the other for a given sample size.

Z-test

4. A Z-test for a single mean is appropriate when:

Correct Answer: c) The population standard deviation is known and the sample size is large or the distribution is normal

Explanation:
A Z-test for a single mean is appropriate when at least one of these conditions is met:

1. The population standard deviation (σ) is known, AND
2. Either:
• The sample size is large (typically n ≥ 30), OR
• The population follows a normal distribution

When the population standard deviation is unknown, we typically use a t-test instead of a Z-test.

The central limit theorem tells us that for large samples, the sampling distribution of the sample mean approaches a normal distribution regardless of the population distribution. This is why Z-tests can be used for large samples even when the population distribution is not normal.

For small samples (n < 30), if the population standard deviation is known and the population is normally distributed, a Z-test remains appropriate.

5. The test statistic for a Z-test for a single mean is given by:

Correct Answer: b) Z = (x̄ - μ₀)/(σ/√n)

Explanation:
The test statistic for a Z-test for a single mean is:

Z = (x̄ - μ₀)/(σ/√n)

Where:
• x̄ = sample mean
• μ₀ = hypothesized population mean (from the null hypothesis)
• σ = known population standard deviation
• n = sample size
• σ/√n = standard error of the mean

This formula standardizes the difference between the sample mean and the hypothesized population mean by dividing by the standard error of the mean. The resulting Z-statistic follows a standard normal distribution N(0,1) under the null hypothesis.

Note that option c) shows the formula for a t-test, which is used when the population standard deviation is unknown and must be estimated using the sample standard deviation s.

6. In a Z-test for the difference between two means, the test statistic is:

Correct Answer: b) Z = (x̄₁ - x̄₂ - (μ₁ - μ₂))/(σ₁²/n₁ + σ₂²/n₂)^(1/2)

Explanation:
The test statistic for a Z-test for the difference between two independent means is given by:

Z = (x̄₁ - x̄₂ - (μ₁ - μ₂))/(σ₁²/n₁ + σ₂²/n₂)^(1/2)

Where:
• x̄₁, x̄₂ = sample means
• μ₁, μ₂ = population means (from the null hypothesis)
• σ₁, σ₂ = known population standard deviations
• n₁, n₂ = sample sizes

In most hypothesis tests for two means, we test whether the difference between population means is zero (H₀: μ₁ - μ₂ = 0), which simplifies the formula to:

Z = (x̄₁ - x̄₂)/(σ₁²/n₁ + σ₂²/n₂)^(1/2)

Option c) shows the formula for a two-sample t-test, where the population standard deviations are unknown and estimated by sample standard deviations s₁ and s₂.

Student's t-test

7. When is a t-test appropriate instead of a Z-test?

Correct Answer: c) When the population standard deviation is unknown and must be estimated from the sample

Explanation:
A t-test is appropriate instead of a Z-test when the population standard deviation is unknown and must be estimated using the sample standard deviation.

The t-test was developed by William Gosset (under the pseudonym "Student") to address the problem of hypothesis testing with small samples and unknown population variance. The t-distribution accounts for the additional uncertainty introduced by estimating the population standard deviation.

Key conditions for using a t-test:
1. The population standard deviation is unknown
2. The sample size is small (typically n < 30), although t-tests can be used for larger samples too
3. The population is normally distributed (especially important for small sample sizes)

As the sample size increases, the t-distribution approaches the standard normal distribution, so for large samples, Z-tests and t-tests yield similar results. However, the t-test remains technically correct when using the sample standard deviation regardless of sample size.

8. The test statistic for a t-test for a single mean is:

Correct Answer: a) t = (x̄ - μ₀)/(s/√n)

Explanation:
The test statistic for a t-test for a single mean is:

t = (x̄ - μ₀)/(s/√n)

Where:
• x̄ = sample mean
• μ₀ = hypothesized population mean (from the null hypothesis)
• s = sample standard deviation (estimating the unknown population standard deviation)
• n = sample size
• s/√n = estimated standard error of the mean

This formula is similar to the Z-test formula, but it uses the sample standard deviation s instead of the population standard deviation σ. The resulting t-statistic follows a t-distribution with (n-1) degrees of freedom under the null hypothesis.

Option d) shows the formula for a Z-test, which is used when the population standard deviation is known.

9. The degrees of freedom for a t-test for a single mean with sample size n is:

Correct Answer: b) n-1

Explanation:
For a t-test for a single mean with sample size n, the degrees of freedom (df) is n-1.

The degrees of freedom represent the number of independent pieces of information available for estimating a parameter. When calculating the sample variance, we lose one degree of freedom because we've already used one piece of information to calculate the sample mean.

Mathematically, the sum of deviations from the mean is always zero: Σ(xᵢ - x̄) = 0. This constraint reduces the degrees of freedom by 1.

The degrees of freedom determine the shape of the t-distribution used to find critical values or p-values. As df increases, the t-distribution approaches the standard normal distribution.

For small samples, the t-distribution has heavier tails than the normal distribution, reflecting the additional uncertainty from estimating the population standard deviation.

10. For a t-test comparing two independent samples with sizes n₁ and n₂, the degrees of freedom when assuming equal variances is:

Correct Answer: c) n₁ + n₂ - 2

Explanation:
For a t-test comparing two independent samples with sample sizes n₁ and n₂, the degrees of freedom when assuming equal variances (pooled t-test) is:

df = n₁ + n₂ - 2

This can be understood as follows:
• Each sample contributes (n-1) degrees of freedom for estimating its variance
• Total degrees of freedom = (n₁-1) + (n₂-1) = n₁ + n₂ - 2

We lose 2 degrees of freedom because we're estimating two parameters: the mean of the first sample and the mean of the second sample.

Note: For a t-test with unequal variances (Welch's t-test), the degrees of freedom is calculated using a more complex formula (Welch-Satterthwaite equation) and is typically not an integer.

11. The pooled variance in a two-sample t-test with equal variances is calculated as:

Correct Answer: b) s²p = [(n₁-1)s₁² + (n₂-1)s₂²]/(n₁ + n₂ - 2)

Explanation:
The pooled variance in a two-sample t-test with equal variances is calculated as:

s²p = [(n₁-1)s₁² + (n₂-1)s₂²]/(n₁ + n₂ - 2)

This formula is a weighted average of the two sample variances, where the weights are the degrees of freedom for each sample. This weighting gives more influence to the sample with the larger size.

Why this formula works:
• (n₁-1) and (n₂-1) are the degrees of freedom for each sample
• s₁² and s₂² are the individual sample variances
• The denominator (n₁ + n₂ - 2) is the total degrees of freedom

The pooled variance is used in the two-sample t-test formula:
t = (x̄₁ - x̄₂)/(sp·√(1/n₁ + 1/n₂))

Where sp is the square root of the pooled variance s²p.

A simple average of the variances (option a) would be incorrect because it doesn't account for different sample sizes.

F-test

12. An F-test is primarily used to:

Correct Answer: b) Compare two population variances

Explanation:
An F-test is primarily used to compare two population variances (or standard deviations). It tests the null hypothesis that the two populations have equal variances against the alternative that they have different variances.

The F-test is named after the F-distribution, which is the sampling distribution of the ratio of two independent chi-square distributions divided by their degrees of freedom.

Common applications of the F-test include:
1. Testing homogeneity of variances before conducting a two-sample t-test that assumes equal variances
2. Analysis of Variance (ANOVA), where the F-test compares the variation between group means to the variation within groups
3. Testing the significance of regression models by comparing the explained variance to the unexplained variance

It's important to note that F-tests are sensitive to departures from normality. The samples should come from normally distributed populations for the test to be valid.

13. The F-statistic is calculated as:

Correct Answer: b) F = s₁²/s₂²

Explanation:
The F-statistic for comparing two population variances is calculated as the ratio of the sample variances:

F = s₁²/s₂²

Where:
• s₁² = first sample variance
• s₂² = second sample variance

Under the null hypothesis that the population variances are equal (σ₁² = σ₂²), this F-statistic follows an F-distribution with degrees of freedom df₁ = n₁-1 and df₂ = n₂-1, where n₁ and n₂ are the sample sizes.

Option a) shows the ratio of population variances, which is the parameter we're testing, not the test statistic itself.

Options c) and d) involve means rather than variances and are not related to the F-test for variances.

14. When calculating the F-statistic, we typically:

Correct Answer: b) Place the larger variance in the numerator

Explanation:
By convention, when performing an F-test, we typically place the larger sample variance in the numerator and the smaller sample variance in the denominator. This ensures that the F-statistic is greater than or equal to 1, which is the standard approach when referring to F-distribution tables.

This convention makes it easier to conduct one-tailed tests and simplifies the interpretation of results. When the larger variance is in the numerator, we're effectively testing whether the first population variance is significantly greater than the second population variance.

If we were to place the smaller variance in the numerator, we would get an F-value between 0 and 1, which would require different critical values for hypothesis testing.

15. The chi-square test for goodness of fit is used to:

Correct Answer: c) Test if sample data fits a specific probability distribution

Explanation:
The chi-square goodness of fit test is specifically designed to determine whether observed frequency data fits (or "conforms to") a particular theoretical distribution or expected pattern.

This test compares observed frequencies in different categories with the frequencies that would be expected under a specified probability distribution or hypothesis. It helps researchers determine if any observed differences between the data and the expected distribution are due to chance or if they represent a significant deviation.

Option a) refers to tests for comparing means (like t-tests).
Option b) refers to F-tests for comparing variances.
Option d) refers to the chi-square test of independence, which is different from the goodness of fit test.

16. The chi-square statistic for goodness of fit is calculated as:

Correct Answer: c) χ² = Σ[(O - E)²/E]

Explanation:
The chi-square statistic is calculated using the formula:

χ² = Σ[(O - E)²/E]

Where:
• O = Observed frequency in each category
• E = Expected frequency in each category under the null hypothesis

This formula measures the squared difference between observed and expected frequencies, standardized by dividing by the expected frequency. The division by E accounts for the relative magnitude of the differences and ensures that categories with larger expected frequencies don't dominate the statistic.

Option a) would only sum the raw differences, which could cancel each other out.
Option b) would sum squared differences without standardizing by the expected frequencies.
Option d) would not square the differences, which wouldn't properly account for both positive and negative differences.

17. For a chi-square goodness of fit test with k categories, the degrees of freedom is:

Correct Answer: c) k-1-m (where m is the number of parameters estimated from the data)

Explanation:
For a chi-square goodness of fit test, the degrees of freedom is calculated as:

df = k - 1 - m

Where:
• k = number of categories or cells
• m = number of parameters estimated from the sample data

We subtract 1 from the number of categories because once we know the frequencies for k-1 categories, the frequency of the last category is determined (due to the constraint that all frequencies must sum to the total sample size).

We further subtract m (the number of parameters estimated from the data) because each parameter estimated introduces an additional constraint. For example, if we estimate the mean of a normal distribution from the data before conducting a goodness of fit test, we lose one more degree of freedom.

In the simplest case where no parameters are estimated (m=0), the degrees of freedom would be k-1.