Lectures for Advanced Statistics

Analysis of Frequency Data - Contingency Tables

test for Independence between (nominal or ordinal) variables containing frequencies
test whether the occurrence of one variable (e.g. eye color {blue, green}) is independent of another (e.g., sex {male, female}) using a Contingency table
test for Goodness of fit
identify those cells that are the main contributors for any overall effect using Cell-wise examination

calculate expected frequencies under the null hypothesis (Note: use Goodman's Test for quasi-independence if the diagonal or other cells are structurally zero)
calculate overall significance of a contingency table with
- X² Statistic: Σ(Observed frequency_i- Expected frequency_i)²
- Likelihood-Ratio Statistic (also known as Likelihood Ratio Chi-square, LRX², G², G-statistic, negative log likelihood, nLL): - 2Σf_iln (f_i/f_i _exp)
compare these values to a Χ² distribution
These analyses can be performed using Java DataGrinder applets
If sample sizes are small you can use a Randomization Test (i.e., Monte Carlo Simulation)

make sure that no expected frequencies are < 3. lump cells with adjacent classes
X² and G are sample statistics; Χ² is a theoretical frequency distribution.
G statistic is preferrable to X²:
- The distribution of G fits a Χ² distribution more closely than does the distribution of X²
- G-values are additive while X²- values are not. Thus, total G can be partitioned into two components, the G for heterogeneity and the pooled G
- in more complex designs G is easier to compute than X² as it is calculated from the observed frequencies only
perform William's correction for estimationg actual type 1 error. Yate's correction is too conservative
perform a cell-wise examination of the matrix using Freeman-Tukey deviates (o - observed cell frequency; e - expected cell frequency) as:

F-T = Math.sqrt(o)+Math.sqrt(o+1)-Math.sqrt((4*e)+1));
compare to: Math.sqrt(df*c²_a[1]/nCells);

last modified: 3/18/08