Lectures for Advanced Statistics

Introduction to Multivariate Techniques

In biology we often wish to describe effects that are complex entities and difficult to characterize in a single measure. We may, therefore, in a single experiment measure a suite of dependent variables instead of a single variable alone and then have a better chance of discovering which factor is truly important.

When considering whether a particular treatment effects a set of dependent variables we could run multiple univariate tests and adjust the p-Value using Dunn-Sedak? What are the drawbacks of an analysis that considers each variable seperately instead of a single one that considers patterns across all of them simultaneously? Specifically, we must take into account correlations among the dependent measures when performing the significance test. A multivariate analysis can protect against Type I errors which might occur if multiple univariate analyses were performed independently. Towards this goal we first create a set of new dependent variables as linear combinations of the measured dependent variables. These artificial dependent variables are chosen to maximize group differences. A correlation between two variables indicates that the two variable in essence are measuring the same thing to some degree. In the multivariate model it is assumed both that the correlations between a set of observed variables can be explained in terms of a simpler set of derived variables. So, how do we go about deriving a set of new, better suited, hypothetical variables from the information obtained about individual correlations among them (i.e., the variance/covariance matrix)?

Multivariate Descriptives

A single, normally distributed variable may be characterize using population parameters mean and variance. Similarly, a multivariately normal set of variables may be described in similar ways

Centroid: multivariate mean, mean vector
Mean Variance as the vector sum divided by N
Variance/Covariance Matrix: Matrix Review calculations using this worksheet.
Geometric Distance A simple Euclidian distance measure that regards all variables orthogonal to each other. If this analysis is perfomed on standarized variables, this analysis is scale-invariant. Based on Pythagoras' Theorem distance between two 3D points is calculated as ...

Mahalinobis Distance, Mahalanobis distance is a distance measure that takes into account correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements. It can be used to calculate the distance between any two vectors and is frequenlty used in cluster analysis and in the identification of outliers.

Links of interest

describe matrices as mean vector and var/covariance matrix

last modified: 03/23/15