Advanced Statistics - Biology 6030
Bowling Green State University, Fall 2017
Error: A statistical "error" does not represent a mistake but the difference between a computed, estimated, or measured value and the true, specified, or theoretically correct value. This error ε is a measure of the amount by which an observation differs from an expected value, such as the underlying population mean. If the mean height of all male students at BGSU is 1.79m then the error of a student with 1.84m is 0.05m. The true error can rarely be measured as the underlying population mean μ is usually not known.
Residual: The residual serves as an estimate of the underlying error. It measures the difference between an observation and a sample statistic which serves as estimate for the underlying population parameter. The latter is indicated by the hat above the entity.
Worksheet: Variance/Covariance using Matrix Algebra
Import datafile "BodyMeasures.txt".
Calculate covariance and correlation matrix on the current data frame. The matrix uses columns 2-12 only - column 1 contains an independent variable and is not included. Then display the matrix.
Calculate eigenvalues and eigenvectors on the covariance matrix, save into a solution, and display. Then repeat for correlation matrix.
The correlation matrix of a number of random variables X1, ..., Xn
is the n × n matrix whose i,j entry is corr(Xi, Xj). If the measures of
correlation used are product-moment coefficients, the correlation
matrix is the same as the covariance matrix of the standardized random
variables Xi / σ (Xi) for i = 1, ..., n.
The correlation matrix is symmetric because the correlation between Xi and Xj is the same as the correlation between Xj and Xi.
Obtain a nice correlation matrix for the data from the box above. Confirm that you have packages ggplot2 and GGally installed.
To calculate a matrix of correlations for columns 2-12 group them and use ggpairs. On the Mac this displays the correlation matrix in a Quartz console.