Advanced Statistics - Biology 6030

Bowling Green State University, Fall 2017

Lab Exercise for R: Multivariate Descriptive Statistics

Exercise 1: Calculate the distance of individual rows from the multivariate mean (centroid) based on its correlational structure

In a dataset with multiple continuous variables we try to calculate the multivariate distance of individuals. The original axes are used to derive new linear combinations based on the correlational structure of variables.

Use file "BodyMeasures.txt", optimize the coordinate space based on correlations between the original variables

> bodyMeasures <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/BodyMeasures.txt", header=T)
> bodyMeasures_numeric <-
bodyMeasures[,2:12]

> bodyMeasures_numeric <- bodyMeasures[2:12]
> covMat <- cov(bodyMeasures_numeric)
> covMat
> corMat <- cor(bodyMeasures_numeric)
> corMat

Now calculate the centroid as the mean vector across all columns in the data frame. Calculate the Mahalanobis distance from the centroid using the variance/covariance matrix

> means <- colMeans(bodyMeasures_numeric)
> means
> D2 <- mahalanobis(bodyMeasures_numeric,means,covMat)
> D2



last modified: 3/23/15
This material is copyrighted and MAY NOT be used for commercial purposes, 2001-2017 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]