Lesson 13: Proportion Z test
Monday, January 15, 2024
In this lesson, we will learn the proportion Z test.
First, load the packages we will use in this lesson.
Proportion Z-test
Proportion Z test is a hypothesis test for the population proportion. It is used to test categorical data.
gender
in the BEPS
dataset of carData
. gender n percent
female 812 0.532459
male 713 0.467541
Hypothesis test for one proportion
The null hypothesis (\(H0\)) is that the population proportion of female is equal to \(0.5\).
The alternative hypothesis (\(H1\)) is that the population proportion of female is not equal to \(0.5\).
Examples
Here Z is the test statistic, \(p_1\) is the sample proportion, \(p_0\) is the hypothesized proportion, and \(n\) is the sample size. \(se\) is the standard error of the sample proportion, and \(\sigma\) is the standard deviation of the sampling distribution of the sample proportion.
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.01209
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5070389 0.5577139
sample estimates:
p
0.532459
The first argument is the number of female, the second argument is the sample size, and the third argument is the hypothesized proportion. The fourth argument is the alternative hypothesis. The default alternative hypothesis is that the population proportion is not equal to the hypothesized proportion. And the default significance level is 0.05.
In this significance level, we reject the null hypothesis because the p-value is less than the significance level 0.05. And the confidence interval does not contain the hypothesized proportion 0.5.
conf.level = 0.99
.
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.01209
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.4991511 0.5654829
sample estimates:
p
0.532459
In the 0.01 significance level, we fail to reject the null hypothesis because the p-value is greater than the significance level 0.01. And the confidence interval contains the hypothesized proportion 0.5.
We can also change the alternative hypothesis to “less” or “greater” by adding the argument alternative = "less"
or alternative = "greater"
.
Since the sample proportion is greater than the hypothesized proportion, we can use the alternative hypothesis “greater”.
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.006045
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.5110761 1.0000000
sample estimates:
p
0.532459
In the 0.05 significance level, we reject the null hypothesis because the p-value is less than the significance level 0.05. And the confidence interval does not contain the hypothesized proportion 0.5.
We can also change the hypothesized proportion to 0.6 by adding the argument p = 0.6
. The null hypothesis is that the population proportion of female is equal to \(0.6\).
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.6
X-squared = 28.706, df = 1, p-value = 8.426e-08
alternative hypothesis: true p is not equal to 0.6
95 percent confidence interval:
0.5070389 0.5577139
sample estimates:
p
0.532459
Two-proportion Z test
Two-proportion Z test is a hypothesis test for the difference between two population proportions. It is also used to test categorical data.
Examples
Let’s say we have another sample which is from the voting data of USA, the gender proportion is \(female = 3120\) while \(male = 2880\), so the total sample of USA is 6000
2-sample test for equality of proportions with continuity correction
data: c(812, 3120) out of c(1525, 6000)
X-squared = 0.70741, df = 1, p-value = 0.4003
alternative hypothesis: two.sided
95 percent confidence interval:
-0.01600389 0.04092192
sample estimates:
prop 1 prop 2
0.532459 0.520000
The first argument is the number of successes in group 1 and group 2, the second argument is the sample size of group 1 and group 2, and the alternative hypothesis and the default significance level is “two.sided” and “0.05”.
In the 0.05 significance level, we fail to reject the null hypothesis because the p-value is greater than the significance level 0.05. And the confidence interval contain 0.
In this lesson, we learned the one-proportion and two-proportion Z test.
The one-proportion Z test is a hypothesis test for one population proportion. It is used to test categorical data.
The two-proportion Z test is a hypothesis test for the difference between two population proportions.
In next lesson, we will learn the t-test.
Thank you!