We use a chi-square test when data come from a nominal scale of measurements, that is, when data are frequency counts. In social or medical sciences, frequency counts are most often the number of people or patients (subjects) who fall into two or more mutually exclusive categories. The variable (i.e. what you observe or measure) is counted in at least two discrete categories (classes). We count its presence in each category. The categories are discrete, that means an observation falls in one or another category but not between two categories. For example the variable eye color can be categorized in „red“, „green“, „blue“ and „brown“ and counted among a group of people.
A chi-square usually begins with the construction of a table, called contingency table, which shows how many subjects fall into each category. It is important that each cell of the table be independent of the other cells. That means that each subject must appear in one and only one of the cells. The contingency coefficient provides us with a measure of the relationships between the nominal-scaled variables. The coefficient is often corrected with the size of the contingency table for a better comparison between different tables.
Construction of a contingency table
For example, let us assume that the police department want to test if female drivers were more likely to make a full stop at a stop sign than male drivers. A police officer observed a busy intersection and collected the following data: 30 female driver made a full stop, and 10 a partial stop. 20 male drivers made a full stop, and 40 a partial stop. The null hypothesis is that the behavior to stop at a intersection is independent of gender. We arrange the data into the following 2×2 contingency table:
and the results are:
Since a p-value of 0.092 is smaller than the conventionally accepted significance level of 0.05 (i.e. p > 0.05) we can reject the null hypothesis. In other words, we can reject the null hypothesis that the behavior to stop at a intersection is independent of gender.
In case of a degree of freedom of one (= number of categories minus one), it is recommended to apply the Yates correction for continuity. If the degrees of freedom is greater than one, Yates correction should not be applied. Without the correction of continuity, the calculate chi square may be inflated enough to cause us to reject the null hypothesis, whereas the corrected chi square might not.