Analysis of Frequency Data - Contingency Tables
Uses
- test for Independence between (nominal or ordinal)
variables containing frequencies
- test whether the occurrence of one variable (e.g. eye color
{blue, green}) is independent of another (e.g., sex {male, female})
using a Contingency table
- test for Goodness of fit
- identify those cells that are the main contributors for any
overall effect using Cell-wise examination
How is this done
- calculate expected frequencies under the null hypothesis
(Note: use Goodman's Test for quasi-independence if the
diagonal or other cells are structurally zero)
- calculate overall significance of a contingency table with
- X2 Statistic: Σ(Observed frequencyi - Expected frequencyi)2
- Likelihood-Ratio Statistic (also known as Likelihood
Ratio Chi-square, LRX2, G2, G-statistic, negative log likelihood, nLL):
- 2Σfiln (fi/fi
exp)
- compare these values to a Χ2
distribution
- These analyses can be performed using Java
DataGrinder applets
- If sample sizes are small you can use a Randomization Test (i.e., Monte Carlo Simulation)
Note
- make sure that no expected frequencies are < 3.
lump cells with adjacent classes
- X2 and G are sample statistics; Χ2
is a theoretical frequency distribution.
- G statistic is preferrable to X2:
- The distribution of G fits a Χ2
distribution more closely than does the distribution of X2
- G-values are additive while X2- values are not.
Thus, total G can be partitioned into two components, the G for
heterogeneity and the pooled G
- in more complex designs G is easier to compute than X2
as it is calculated from the observed frequencies only
- perform William's correction for estimationg actual
type 1 error. Yate's correction is too conservative
- perform a cell-wise examination of the matrix using Freeman-Tukey
deviates (o - observed cell frequency; e - expected cell frequency)
as:
F-T = Math.sqrt(o)+Math.sqrt(o+1)-Math.sqrt((4*e)+1));
compare to: Math.sqrt(df*c2a[1]/nCells);
last modified: 3/18/08