Correspondence Analysis is a descriptive/exploratory technique that examines simple two-way and multi-way crosstabulation tables for some measure of correspondence between the rows and columns. This analysis aims to represent the distances between relative frequencies in individual rows and/or columns in a low-dimensional space.
exploratory technique for frequencies to develop models that fit the data, rather than the testing of hypotheses based on a lack of fit
analyze two-way and multi-way frequency crosstabulation tables containing some measure of correspondence between the rows and columns
represent distances between cases (i.e., differences between the pattern of relative frequencies for rows across columns, and columns across the rows) in a low-dimensional space (note: this is similar to the results produced by Factor Analysis for continuous variables)
Cases are defined in two ways: each row is viewed as an individual case defined in a space with number of dimensions equal to the number of columns, or each column is viewed as an individual case defined in a space with number of dimensions equal to the number of rows
Mass: distribution of relative frequencies, i.e., total mass, row mass, column mass. After standardizing the table so the sum of all cell entries equals 1, this table shows how one unit of mass is distributed across the cells.
Inertia: sum of weighted distances, integral of mass times the squared distance to the centroid (i.e., Pearson Chi-square / total sum), Variation acccounted for by a particular axis.
cases with similar characteristics should be plotted in close proximity, while more different ones are representative distances apart. The analysis aims to retain as much as possible of the Euclidian distances that separate the cases
The number of eigenvectors that can be extracted is equal to the lesser of columns - 1 or rows - 1. With this many dimensions extracted all information contained in the table can be reproduced exactly.
Plot represents c2 between cases when column points are compared after column standardization, or row points are compared after ro standardization
Quality: ratio of the original squared distance of a point from the origin over the squared distance in the simplified space
How this is done
standardize frequencies (i.e., create relative frequencies with a sum across all cells of 1.0)
calculate a matrix of distance measures (e.g., Euclidian) between all cases from these relative frequencies
if rows and columns are independent, the distribution of mass can be recreated from row and column totals alone
decompose Inertia with a small number of dimensions in which the deviations from the expected values can be represented
report eigenvectors with % Inertia explained
eigenvalue and eigenvector matrices can be standardized in different ways to bring out different features (i.e., row or column profile, canonical);
plot cases into new coordinate space defined by the newly derived dimensions. Row and column coordinates can be plotted in a single plot, however, row points can only be compared to row points and column points only to other column points