In R, the Chi-Square Test is typically used to assess whether there is a significant association between categorical variables. There are two main types of Chi-Square tests:
- Chi-Square Test for Independence: Used to determine if two categorical variables are independent.
- Chi-Square Goodness of Fit Test: Used to check if a sample data matches a population distribution.
Here’s how to perform each of these tests in R:
1. Chi-Square Test for Independence
You can use the chisq.test()
function to test whether two categorical variables are independent. This test is often applied to a contingency table.
Example:
Suppose we have a 2×2 contingency table:
Male | Female | |
---|---|---|
Yes | 30 | 20 |
No | 10 | 40 |
# Create a contingency table
data <- matrix(c(30, 20, 10, 40), nrow = 2, byrow = TRUE)
# Assign row and column names
dimnames(data) <- list("Response" = c("Yes", "No"), "Gender" = c("Male", "Female"))
# Perform Chi-Square Test for Independence
result <- chisq.test(data)
# View the result
print(result)
Output interpretation:
p-value
: If the p-value is less than 0.05, we reject the null hypothesis and conclude that there is a significant association between the two categorical variables.X-squared
: The test statistic.df
: Degrees of freedom.
2. Chi-Square Goodness of Fit Test
This test checks if the observed distribution matches an expected distribution. For example, testing if a dice is fair.
Example:
If you roll a dice 60 times and get the following counts:
- 1: 12 times
- 2: 10 times
- 3: 8 times
- 4: 15 times
- 5: 7 times
- 6: 8 times
We expect each number to appear 10 times (since a fair die has 6 sides and 60 rolls).
# Observed frequencies
observed <- c(12, 10, 8, 15, 7, 8)
# Expected frequencies (for a fair dice, each face should appear 10 times)
expected <- rep(10, 6)
# Perform Chi-Square Goodness of Fit Test
result <- chisq.test(observed, p = expected / sum(expected))
# View the result
print(result)
Output interpretation:
p-value
: If the p-value is less than 0.05, the observed data significantly deviates from the expected distribution.X-squared
: The test statistic.df
: Degrees of freedom (usually the number of categories minus 1).
Additional Notes:
- Assumptions: The Chi-Square test assumes that the expected frequency in each cell is large enough (typically ≥ 5) to ensure the validity of the approximation.
- Warnings: If your data doesn’t meet the assumptions (e.g., small expected counts), consider using Fisher’s Exact Test for smaller sample sizes.