Labs for Advanced Stats

Lab Exercise 1 for R: Setup

Exercise 1: Open R, read in a text file and do some basic data management

There are a bunch of ways in which we can slice and dice a data file. Here are some concepts that apply.

In R download and read the content from the UCLA file "mmreg.csv"with info for 600 students. The psychological variables are measures of control, self_concept and motivation. The academic variables are standardized tests in reading (read), writing (write), math (math) and science (science). Additionally, the variable sex codes female students with "1", and males with "0".

> mm <- read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
> colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math", "Science", "Sex")
> summary(mm)

First let us create subsets of variables (i.e., columns 1-3 for spychological measures, 4-7 for academic metrics) in which we designate row and column descriptors using the syntax of [rows,columns], a blank indicates the entire range

> psych_mm <- mm[,1:3]
> acad_mm <- mm[,4:7]
> sex_mm <- mm[,8]
> summary(acad_mm)

To subset for specific rows we can use a boolean that tests for a specific match, eg. all female students

> F = mm$Sex == 1
> female_mm <- mm[F,]
> M = mm$Sex == 0
> male_mm <- mm[M,]
> summary(male_mm)

... and yes, you guessed it, the psychological measures for all male students can then be obtained as ...

> psych_male_mm <- mm[M,1:3]
> summary(psych_male_mm)

Exercise 2:

Create a new column/variable that contain measures of the trunk's cirumference calculated from each diameter

> mm$addedMotivation <- mm$Motivation * 10
> mean(mm$addedMotivation[!is.na(mm$addedMotivation)])

Exercise 3: Read in a text file and substitute missing values with the mean

In same cases it may be appropriate to replace a missing value with the mean for the variable. You can go into your text file with a text editor and fille the missing valuews with the mean, or, alternatively, you can do this in R.

In R download and read the content from the cichlid brain data file

> CB <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/CichlidBrains.txt", sep="\t", header=T)

The following code iterates through all columns (variable counter is i) and tests whether the variable is of type numeric. If that is the case then the code inside the squiggly brackets is execute. In our case it prints the name of the variable to the console.

> for (i in which(sapply(CB, is.numeric))) { print(names(CB[i])) }

To access the values in a particular (in this case column 9) use ...

> for (i in 1:length(CB[,9])){ print(CB[i,9]) }

To substitute missing values in a single column (in this case column 9) with the value 3.14159 ...

> CB[9][is.na(CB[9])] <- 3.14159

... and yes, you can combine all these clauses in any way you wish. For instance to replace all missing values in any numerical variable of the data frame with the variables mean use ...

> for (i in which(sapply(CB, is.numeric))) { CB[i][is.na(CB[i])] <- mean(CB[,i], na.rm = T) }

Exercise 4: Define a function and use it to reformat a data file from column into stacked format

The function make.rm() has been published on the cran server. Copy it from the webpage and paste it into your R console to define it in your runtime environment

Define function make.rm()

> make.rm<-function(constant,repeated,data,contrasts) {
+ if(!missing(constant) && is.vector(constant)) {
+   if(!missing(repeated) && is.vector(repeated)) {
+    if(!missing(data)) {
+     dd<-dim(data)
+     replen<-length(repeated)
+     if(missing(contrasts))
+      contrasts<-
+       ordered(sapply(paste("T",1:length(repeated),sep=""),rep,dd[1]))
+     else
+      contrasts<-matrix(sapply(contrasts,rep,dd[1]),ncol=dim(contrasts)[2])
+     if(length(constant) == 1) cons.col<-rep(data[,constant],replen)
+     else cons.col<-lapply(data[,constant],rep,replen)
+     new.df<-data.frame(cons.col,
+      repdat=as.vector(data.matrix(data[,repeated])),
+      contrasts)
+     return(new.df)
+    }
+   }
+ }
+ cat("Usage: make.rm(constant, repeated, data [, contrasts])\n")
+ cat("\tWhere 'constant' is a vector of indices of non-repeated data and\n")
+ cat("\t'repeated' is a vector of indices of the repeated measures data.\n")
+ }

You can now use the make.rm function to reformat your data from column to stacked format. Read a data file that contains repeated measures as multiple columns

> groceries = read.table("http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/groceries.txt", header=T)
> groceries

> groceries.stacked <- make.rm(constant="subject", repeated=c("storeA","storeB","storeC","storeD"), data=groceries)
> groceries.stacked

last modified: 4/21/15