Advanced Statistics - Biology 6030

Bowling Green State University, Fall 2017

Variance - Covariance Matrix

Error: A statistical "error" does not represent a mistake but the difference between a computed, estimated, or measured value and the true, specified, or theoretically correct value. This error ε is a measure of the amount by which an observation differs from an expected value, such as the underlying population mean. If the mean height of all male students at BGSU is 1.79m then the error of a student with 1.84m is 0.05m. The true error can rarely be measured as the underlying population mean μ is usually not known.

Error

Residual: The residual serves as an estimate of the underlying error. It measures the difference between an observation and a sample statistic which serves as estimate for the underlying population parameter. The latter is indicated by the hat above the entity.

Residual

Uses

How is this done

Worksheet: Variance/Covariance using Matrix Algebra

Import datafile "BodyMeasures.txt".

> bodyMeasures <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/BodyMeasures.txt", header=T)

Calculate covariance and correlation matrix on the current data frame. The matrix uses columns 2-12 only - column 1 contains an independent variable and is not included. Then display the matrix.

> covMat <- cov(bodyMeasures[2:12])
> covMat
> corMat <- cor(bodyMeasures[2:12])
> corMat

Calculate eigenvalues and eigenvectors on the covariance matrix, save into a solution, and display. Then repeat for correlation matrix.

> PCsolution1 <- eigen(covMat)
> PCsolution1
> PCsolution2 <- eigen(corMat)
> PCsolution2

Graphics

The correlation matrix of a number of random variables X1, ..., Xn is the n n matrix whose i,j entry is corr(Xi, Xj). If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables Xi / σ (Xi) for i = 1, ..., n.
The correlation matrix is symmetric because the correlation between Xi and Xj is the same as the correlation between Xj and Xi.

Obtain a nice correlation matrix for the data from the box above. Confirm that you have packages ggplot2 and GGally installed.

> require(ggplot2)
> require(GGally)

To calculate a matrix of correlations for columns 2-12 group them and use ggpairs. On the Mac this displays the correlation matrix in a Quartz console.

> measures <- bodyMeasures[,2:12]
> ggpairs(measures)


last modified: 2/10/14
This material is copyrighted and MAY NOT be used for commercial purposes, 2001-2017 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]