Lectures for Advanced Statistics

Variance - Covariance Matrix

Error: A statistical "error" does not represent a mistake but the difference between a computed, estimated, or measured value and the true, specified, or theoretically correct value. This error ε is a measure of the amount by which an observation differs from an expected value, such as the underlying population mean. If the mean height of all male students at BGSU is 1.79m then the error of a student with 1.84m is 0.05m. The true error can rarely be measured as the underlying population mean μ is usually not known.

Residual: The residual serves as an estimate of the underlying error. It measures the difference between an observation and a sample statistic which serves as estimate for the underlying population parameter. The latter is indicated by the hat above the entity.

Uses

Measure dispersion or spread of values within a sample
Estimate dispersion or spread of values within the entire population

How is this done

For each value calculate the difference between it and the average value.
Calculate the squares of these differences and sum them.
To find the average (i..e., variance σ2) of the squared differences devide by N-1.
To obtain an estimate of the standard deviation take the square root of the variance.

Worksheet: Variance/Covariance using Matrix Algebra

Import datafile "BodyMeasures.txt".

> bodyMeasures <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/BodyMeasures.txt", header=T)

Calculate covariance and correlation matrix on the current data frame. The matrix uses columns 2-12 only - column 1 contains an independent variable and is not included. Then display the matrix.

> covMat <- cov(bodyMeasures[2:12])
> covMat
> corMat <- cor(bodyMeasures[2:12])
> corMat

Calculate eigenvalues and eigenvectors on the covariance matrix, save into a solution, and display. Then repeat for correlation matrix.

> PCsolution1 <- eigen(covMat)
> PCsolution1
> PCsolution2 <- eigen(corMat)
> PCsolution2

Graphics

The correlation matrix of a number of random variables X1, ..., Xn is the n × n matrix whose i,j entry is corr(Xi, Xj). If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables Xi / σ (Xi) for i = 1, ..., n.
The correlation matrix is symmetric because the correlation between Xi and Xj is the same as the correlation between Xj and Xi.

Obtain a nice correlation matrix for the data from the box above. Confirm that you have packages ggplot2 and GGally installed.

> require(ggplot2)
> require(GGally)

To calculate a matrix of correlations for columns 2-12 group them and use ggpairs. On the Mac this displays the correlation matrix in a Quartz console.

> measures <- bodyMeasures[,2:12]
> ggpairs(measures)

last modified: 2/10/14