Let us start the topic by reviewing the concept of independent events. After all, as the name implies, the “test of independence” tests whether two events are independent or not.
 Independent Events
We have previously defined two events A and B are independent if P(A) = P(AB), that is, the chance of A is the same with the chance of A given B. The example we used involving calculation of P(red card) and P(odd number card). We found out that P(odd number card) = P(odd number card  red card). The probability of picking an odd number card is the same regardless you are picking it 1) from an entire deck of cards – P(odd number card) or 2) from red card from the deck – P(odd number card  red card).
You can test the conclusion by using the equation.
 P(odd number card) = 28/52
 P(odd number card  red card) = 14/26
Table 1. Crosstabulation of Even/Odd Number and Color of Poker Cards
Odd number 
Even number 
Row Total 

Red 
14 
12 
26 
Black 
14 
12 
26 
Column Total 
28 
24 
52 
When we say two events are independent, it means the marginal probabilities (i.e., the probability of the event by itself) of two events do not interfere with each other.
In test of independence, we determine whether two categorical variables are independent by summarizing them into a crosstabulation format. This statement mentions several boundaries and conditions attached to test of independence:
 At the current stage, we will only conduct tests that involves two dimensions (e.g., black or red and odd or even) or events;
 The variables involved will be categorical data;
 You need to summarize your sample data in the crosstabulation format.
Please see below for a snippet of sample dataset that is ideal for this type of test.
Figure 1. A snippet of sample data
The sample data involves two events (or dimensions), office and Yes/No. “Office” has three values: Office 1, Office 2, and Office 3. Yes/No event has two values: Yes and No. Therefore, the crosstabulation for this data can take following form.
Office 1 
Office 2 
Office 3 
Total Yes/No 

Yes 
x_{1yes} 
x_{2yes} 
x_{3yes} 
\Sigmax_{yes} 
No 
x_{1no} 
x_{2no} 
x_{3no} 
\Sigmax_{no} 
Total Office 
\Sigmax_{1} 
\Sigmax_{2} 
\Sigmax_{3} 
\Sigmax 
(* You will be provided with a data file that contains two or more variables that can be used to conduct a test of independence. Please refer later section for how to use Pivot table to make a crosstabulation like the one above.)
 Expected Form of CrossTabulation when Two Events (dimensions) are Independent
There are some books saying that we are testing independence of “two variables”. I think it is confusing because sometimes there are more than two variables (i.e., like the previous “ideal” dataset sample) involved. It is really two events or two dimensions that are under consideration for testing.
Following table is intended to show whether two events – degree and income are independent. x represents frequency of each group out of four combinations. We know what the column and row totals are, but do not know specific x.
High Income 
Low Income 
Row Total 

Graduate Degree 
x_{ghigh} 
x_{glow} 
200 
Undergraduate Degree 
x_{uhigh} 
x_{ulow} 
400 
Column Total 
240 
360 
600 
WHEN TWO EVENTS ARE INDEPENDENT, it forms a certain pattern that even though we do not know specific values of each x, as long as we know the column and row total, we can correctly infer those x values. Please refer following calculation.
If degree and income are independent, based on the previously learned probability calculation, we have:
 P(High Income) = P(High Income  Graduate Degree) = 240/600 = x_{ghigh} /200
 x_{ghigh }= 80
 We see the proportion of high income column total (240) over grand total (600) is the same with the graduate degree and high income count (x_{ghigh }= 80) over the row total of graduate degree (200). Also, the proportion of graduate degree column total (200) over grand total (600) is the same with the x_{ghigh }over the column total of High income.
 You can find out the patterns for all remaining three x.
Do you sense (or notice) the pattern and do you understand why?
 If income and degree have nothing to do with each other, the pattern (or the distribution) of each income group (low and high) within each degree (undergraduate and graduate) should not differ drastically.
Based on this expected pattern of crosstabulation of two independent events, we can conduct the test of independence. In this test, instead of relying on previously used sample statistics (i.e., mean, standard deviation, and proportion), we use the frequency pattern of each group.
 Measuring the Differences between Expected and Observed Pattern using Chisquare
Here is the completed expected frequency shown in a crosstabulation form when two events are independent.
High Income 
Low Income 
Row Total 

Graduate Degree 
80 
120 
200 
Undergraduate Degree 
160 
240 
400 
Column Total 
240 
360 
600 
Following table shows the actual (or observed) frequency for each group. (Remarks: It is just an example I made up.)
High Income 
Low Income 
Row Total 

Graduate Degree 
160 
40 
200 
Undergraduate Degree 
80 
320 
400 
Column Total 
240 
360 
600 
Let us now see how much deviations occurred in the observed frequency when compared to the expected frequency. To do so, I am using a test statistic called chisquare to measure the deviations.
Chisquare = (80160)^2/160 + (12040)^2/40 + (16080)^2/80 + (240320)^2/320 = 300