## Discriminant Function Analysis

Discriminant Analysis (DA) is a statistical technique that examines a set of independent (i.e., predictor) variables in order to predict or explain a nonmetric dependent (i.e., criterion, grouping) variable (with two or more a priori categories).

DA is used in the situation where group membership is known for a set of individuals and where we have obtained a variety of measures from each individual. Recall our discussions on Multiple Linear Regression Analysis, where we developed an equation that summarized the relationship between a dependent and a set of independent variables. Similarly, DA obtains an equation that best separates members of the groups.

### Examples

• predict whether individuals will attack another individual on the basis of morphological, behavioral, or neurochemical variables
• predict whether a population will go extinct based on knowledge of parameters related to the habitat, other species present, distribution and amount of particular food items
• predict most sucessful therapeutic technique based on symptoms, genetic information, and family history

#### Uses

• classify cases into one of several mutually exclusive categories (dependent variable) based on various characteristics (independent variables)
• isolate those characteristics that are most useful for distinguishing the groups
• test how well our classification works

### Assumptions

• The underlying data must be distributed multivariate normally and relationships must be linear. If such requirements are not met, multi-dimensional scaling may be an alternative.
• Homogeneity of variance/covariances across treatment groups: test assumption with Box's M test
• Absence of outliers
• Independent Variables are indepedent, non-collinear
• The number of data points should be the greater of either N=100 or N=5 x the number of raw data variables used.
• Examine whether data contain sufficient correlations to warrant PCA by testing with Bartlett's sphericity test (i.e., whether the correlation matrix (variance/covariance matrix) is an identity matrix), or Kaiser-Meyer-Olkin Measure of Sampling Adequacy indicates the proportion of variance in your variables which is shared among variables. With KMO between 0.5 and 1, a factor analysis may be of value.
• Matrix Ill-Conditioning: Ill-conditioned matrices produce estimated coefficients that are are unstable (i.e. small changes within the range of measuring error of the variables can lead to disproportionately large changes in the estimates)

### How this is done

• As in other regression techniques, linear algebra allows us to extract a set of discriminant axes which partition the total sum of squares into its two components, the Between (Model) SS and the Within (Error) SS. The goal of DA is to obtain a latent axis (i.e., canonical root) as a linear combination of the original variables in order to minimize the Within SS term (and thus maximize the Between term). Factor Pattern Matrix (i.e., component loadings, factor loadings) characterizes the actual eigenvectors in true size as they combine both information of magnitude (singular value) and direction (unit-length eigenvector matrix). Factor loadings are also the correlation coefficients
• Using ordinary least-squares estimation, Discriminant coefficients (Bn) of standardized predictor variables (Xn) are analogous to the Beta weights of a multiple regression which maximizes the distance between the means of the criterion (dependent) variable.

### Analysis Detail

• Centroid and multivariate normal distributions
• To achieve a maximum separation of cases assigned to different groups DFA defines a new variable as a linear combination of our independent variables where the group centroids would plot as far apart as possible from each other.
• Canonical Correlations: the linear combinations of sets of Y and X variables that achieve maximum correlation
• Wilk's Lambda (U Statistic): SSwithin / SStotal. It refers to the proportion of variance not explained by the group differences. Lambda can be transformed into an F-statistic and in the two-group scenario is identical to Hotelling's T2.
• Linear combinations of the (dependent) characteristics are formed and serve as the basis for assigning cases to groups based on a particular score:
• • Discriminant Function Coefficients for linear combinations are so chosen that they result in the "best" separation of groups (maximum Wilk's lambda)
• Summarize group differences in the most efficient way,
• Rearrange coordinate system to one that is most effective in achieving separation using the fewest number of derived, linear equations
• Canonical Centroid Plot
• To test how well your classification performs you can create the discrimination model based on a random subset of 70% of your data, while you hold back the remaining 30% to examine effectiveness of the obtained classification

In R you first need to import datafile "BodyMeasures.txt", define n to represent the sample size, and group your response variables into a set using cbind.

Make sure you have library MASS installed and perform a Linear Discriminant Analysis. CV=TRUE generates jacknifed (i.e., leave one out) predictions

> library(MASS)
> fit_LDA <- lda(Sex ~ Ys, data = bodyMeasures, CV=TRUE)
> fit_LDA
> summary(fit_LDA)
> plot(fit_LDA)

Now assess the accuracy of the prediction by testing for the percent of the cases that are classified correctly, and the total percent correct

> ct <- table(bodyMeasures\$Sex, fit_LDA\$class)
> diag(prop.table(ct, 1))
> sum(diag(prop.table(ct)))

Display the results of a linear classifications two variables at a time.

> install.packages("klaR")
> library(klaR)
> partimat(Sex ~ Ys, data = bodyMeasures, method="lda")