## Analysis of Frequency Data - Contingency Tables

### Uses

• test for Independence between (nominal or ordinal) variables containing frequencies
• test whether the occurrence of one variable (e.g. eye color {blue, green}) is independent of another (e.g., sex {male, female}) using a Contingency table
• test for Goodness of fit
• identify those cells that are the main contributors for any overall effect using Cell-wise examination

### How is this done

• calculate expected frequencies under the null hypothesis (Note: use Goodman's Test for quasi-independence if the diagonal or other cells are structurally zero)
• calculate overall significance of a contingency table with
• X2 Statistic: Σ(Observed frequencyi - Expected frequencyi)2
• Likelihood-Ratio Statistic (also known as Likelihood Ratio Chi-square, LRX2, G2, G-statistic, negative log likelihood, nLL): - 2Σfiln (fi/fi exp)
• compare these values to a Χ2 distribution
• These analyses can be performed using Java DataGrinder applets
• If sample sizes are small you can use a Randomization Test (i.e., Monte Carlo Simulation)

### Note

• make sure that no expected frequencies are < 3. lump cells with adjacent classes
• X2 and G are sample statistics; Χ2 is a theoretical frequency distribution.
• G statistic is preferrable to X2:
• The distribution of G fits a Χ2 distribution more closely than does the distribution of X2
• G-values are additive while X2- values are not. Thus, total G can be partitioned into two components, the G for heterogeneity and the pooled G
• in more complex designs G is easier to compute than X2 as it is calculated from the observed frequencies only
• perform William's correction for estimationg actual type 1 error. Yate's correction is too conservative
• perform a cell-wise examination of the matrix using Freeman-Tukey deviates (o - observed cell frequency; e - expected cell frequency) as:
```F-T = Math.sqrt(o)+Math.sqrt(o+1)-Math.sqrt((4*e)+1));
compare to: Math.sqrt(df*c2a[1]/nCells);```