## Lab Exercise 1 for R: Setup

### Exercise 1: Open R, read in a text file and do some basic data management

There are a bunch of ways in which we can slice and dice a data file. Here are some concepts that apply.

### Exercise 2:

• Create a new column/variable that contain measures of the trunk's cirumference calculated from each diameter

### Exercise 3: Read in a text file and substitute missing values with the mean

• In same cases it may be appropriate to replace a missing value with the mean for the variable. You can go into your text file with a text editor and fille the missing valuews with the mean, or, alternatively, you can do this in R.
 In R download and read the content from the cichlid brain data file > CB <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/CichlidBrains.txt", sep="\t", header=T) The following code iterates through all columns (variable counter is i) and tests whether the variable is of type numeric. If that is the case then the code inside the squiggly brackets is execute. In our case it prints the name of the variable to the console. > for (i in which(sapply(CB, is.numeric))) { print(names(CB[i])) } To access the values in a particular (in this case column 9) use ... > for (i in 1:length(CB[,9])){ print(CB[i,9]) } To substitute missing values in a single column (in this case column 9) with the value 3.14159 ... > CB[is.na(CB)] <- 3.14159 ... and yes, you can combine all these clauses in any way you wish. For instance to replace all missing values in any numerical variable of the data frame with the variables mean use ... > for (i in which(sapply(CB, is.numeric))) { CB[i][is.na(CB[i])] <- mean(CB[,i], na.rm = T) }

### Exercise 4: Define a function and use it to reformat a data file from column into stacked format

• The function make.rm() has been published on the cran server. Copy it from the webpage and paste it into your R console to define it in your runtime environment
 Define function make.rm() > make.rm<-function(constant,repeated,data,contrasts) { +  if(!missing(constant) && is.vector(constant)) { +   if(!missing(repeated) && is.vector(repeated)) { +    if(!missing(data)) { +     dd<-dim(data) +     replen<-length(repeated) +     if(missing(contrasts)) +      contrasts<- +       ordered(sapply(paste("T",1:length(repeated),sep=""),rep,dd)) +     else +      contrasts<-matrix(sapply(contrasts,rep,dd),ncol=dim(contrasts)) +     if(length(constant) == 1) cons.col<-rep(data[,constant],replen) +     else cons.col<-lapply(data[,constant],rep,replen) +     new.df<-data.frame(cons.col, +      repdat=as.vector(data.matrix(data[,repeated])), +      contrasts) +     return(new.df) +    } +   } +  } +  cat("Usage: make.rm(constant, repeated, data [, contrasts])\n") +  cat("\tWhere 'constant' is a vector of indices of non-repeated data and\n") +  cat("\t'repeated' is a vector of indices of the repeated measures data.\n") + } You can now use the make.rm function to reformat your data from column to stacked format. Read a data file that contains repeated measures as multiple columns > groceries = read.table("http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/groceries.txt", header=T) > groceries > groceries.stacked <- make.rm(constant="subject", repeated=c("storeA","storeB","storeC","storeD"), data=groceries) > groceries.stacked