Labs for Advanced Stats

Lab Exercise for R: Canonical Correlation Analysis

Exercise 1: Divide a set of variables into two separate matrices

In a dataset with multiple continuous variables we calculate the multivariate distance of individuals. The original axes are used to derive new linear combinations based on the correlational structure of variables.

Use file "mmreg.csv", and divide the variables into measuring psychological traits and academic abilities

> mm <- read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
> colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math", "Science", "Sex")
> psych <- mm[, 1:3]
> acad <- mm[, 4:8]

Using library CCA obtain correlation matrices within and between the two sets of variables

> require(CCA)
> matcor(psych, acad)

Calculate the canonical correlation between the two sets of variables. columns 3 and 4 contain the x and y coefficients

> cc1 <- cc(psych, acad)
> cc1$cor # display the canonical correlation coefficients
> names(cc1)
> cc1[3:4] # display the x and y coefficients

Test for significance of the number of dimensions. This is not available in a single package and needs to be calculated. To do this first intialize a couple of the dimensions (with sincere thanks to idre @UCLA)

> n <- dim(psych)[1]
> p <- length(psych)
> q <- length(acad)

then perform the basic calculations

> {
   ev <- (1 - cc1$cor^2)
   k <- min(p, q)
   m <- n - 3/2 - (p + q)/2
   w <- rev(cumprod(rev(ev)))
   d1 <- d2 <- f <- vector("numeric", k)
   for (i in 1:k) {
       s <- sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
       si <- 1/s
     d1[i] <- p * q
     d2[i] <- m * s - p * q/2 + 1
     r <- (1 - w[i]^si)/w[i]^si
       f[i] <- r * d2[i]/d1[i]
   p <- p - 1
   q <- q - 1
   }
   pv <- pf(f, d1, d2, lower.tail = FALSE)
}

then perform and report the tests of significance for the canonical axes

> dmat <- cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
> dmat

to standardize canonical coefficients for psych SDs

> s1 <- diag(sqrt(diag(cov(psych))))
> s1 %*% cc1$xcoef

to standardize canonical coefficients for acad SDs

> s2 <- diag(sqrt(diag(cov(acad))))
> s2 %*% cc1$ycoef

last modified: 3/23/15