Advanced Statistics - Biology 603

Bowling Green State University, Spring 2008

Discriminant Function Analysis

Discriminant Analysis (DA) is a statistical technique that examines a set of independent (i.e., predictor) variables in order to predict or explain a nonmetric dependent (i.e., criterion, grouping) variable (with two or more a priori categories).

DA is used in the situation where group membership is known for a set of individuals and where we have obtained a variety of measures from each individual. Recall our discussions on Multiple Linear Regression Analysis, where we developed an equation that summarized the relationship between a dependent and a set of independent variables. Similarly, DA obtains an equation that best separates members of the groups.

Examples

Uses

Assumptions

Centroid and multivariate normal distributions

To achieve a maximum separation of cases assigned to different groups DFA defines a new variable as a linear combination of our independent variables where the group centroids would plot as far apart as possible from each other.

Canonical Correlations: the linear combinations of sets of Y and X variables that achieve maximum correlation

Wilk's Lambda (U Statistic): SSwithin / SStotal. It refers to the proportion of variance not explained by the group differences. Lambda can be transformed into an F-statistic and in the two-group scenario is identical to Hotelling's T2.

Linear combinations of the (dependent) characteristics are formed and serve as the basis for assigning cases to groups based on a particular score:

Discriminant Function Coefficients for linear combinations are so chosen that they result in the "best" separation of groups (maximum Wilk's lambda)

Summarize group differences in the most efficient way,

Rearrange coordinate system to one that is most effective in achieving separation using the fewest number of derived, linear equations

Canonical Centroid Plot

How this is done

As in other regression techniques, linear algebra allows us to extract a set of discriminant axes which partition the total sum of squares into its two components, the Between (Model) SS and the Within (Error) SS. The goal of DA is to obtain a latent axis (i.e., canonical root) as a linear combination of the original variables in order to minimize the Within SS term (and thus maximize the Between term)Factor Pattern Matrix (i.e., component loadings, factor loadings) characterizes the actual eigenvectors in true size as they combine both information of magnitude (singular value) and direction (unit-length eigenvector matrix). Factor loadings are also the correlation coefficients

Using ordinary least-squares estimation, Discriminant coefficients (Bn) of standardized predictor variables (Xn) are analogous to the Beta weights of a multiple regression which maximizes the distance between the means of the criterion (dependent) variable.


last modified: 01/11/05
This material is copyrighted and MAY NOT be used for commercial purposes, © 2001-2008 lobsterman.
[ Advanced Statistics Course page | About BIO 603 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]