Monday, June 13, 2011

Conduct and Interpret the Chi-Square Test of Independence

The Chi-Square Test of Independence is also known as Pearson’s Chi-Square, Chi-Squared, or c².  c is the Greek letter Chi.  The Chi-Square Test has two major fields of application: 1) goodness of fit test and 2) test of independence.

Firstly, the Chi-Square Test can test whether the distribution of a variable in a sample approximates an assumed theoretical distribution (e.g., normal distribution, Beta).  [Please note that the Kolmogorov-Smirnoff test is another test for the goodness of fit.  The Kolmogorov-Smirnov test has a higher power, but can only be applied to continuous-level variables.]

Secondly, the Chi-Square Test can be used to test of independence between two variables.  That means that it tests whether one variable is independent from another one.  In other words, it tests whether or not a statistically significant relationship exists between a dependent and an independent variable.  When used as test of independence, the Chi-Square Test is applied to a contingency table, or cross tabulation (sometimes called crosstabs for short).

Typical questions answered with the Chi-Square Test of Independence are as follows:
  • Medicine - Are children more likely to get infected with virus A than adults?
  • Sociology - Is there a difference between the marital status of men and woman in their early 30s?
  • Management - Is customer segment A more likely to make an online purchase than segment B?
  • Economy - Do white-collar employees have a brighter economical outlook than blue-collar workers?
As we can see from these questions and the decision tree, the Chi-Square Test of Independence works with nominal scales for both the dependent and independent variables.  These example questions ask for answer choices on a nominal scale or a tick mark in a distinct category (e.g., male/female, infected/not infected, buy online/do not buy online).
In more academic terms, most quantities that are measured can be proven to have a distribution that approximates a Chi-Square distribution.  Pearson’s Chi Square Test of Independence is an approximate test.  This means that the assumptions for the distribution of a variable are only approximately Chi-Square.  This approximation improves with large sample sizes.  However, it poses a problem with small sample sizes, for which a typical cut-off point is a cell size below five expected occurrences.

Taking this into consideration, Fisher developed an exact test for contingency tables with small samples.  Exact tests do not approximate a theoretical distribution, as in this case Chi-Square distribution.  Fisher’s exact test calculates all needed information from the sample using a hypergeocontinuous-level distribution.

What does this mean? Because it is an exact test, a significance value p calculated with Fisher’s Exact Test will be correct; i.e., when ρ =0.01 the test (in the long run) will actually reject a true null hypothesis in 1% of all tests conducted.  For an approximate test such as Pearson’s Chi-Square Test of Independence this is only asymptotically the case.  Therefore the exact test has exactly the Type I Error (α-Error, false positives) it calculates as ρ-value.

When applied to a research problem, however, this difference might simply have a smaller impact on the results.  The rule of thumb is to use exact tests with sample sizes less than ten.  Also both Fisher’s exact test and Pearson’s Chi-Square Test of Independence can be easily calculated with statistical software such as SPSS.

The Chi-Square Test of Independence is the simplest test to prove a causal relationship between an independent and one or more dependent variables.  As the decision-tree for tests of independence shows, the Chi-Square Test can always be used.

No comments: