Advanced Statistics - Biology 6030

Bowling Green State University, Fall 2017

Lab Exercise 1 for R: Setup

Exercise 1: Open R, read in a text file and do some basic data management

There are a bunch of ways in which we can slice and dice a data file. Here are some concepts that apply.

In R download and read the content from the UCLA file "mmreg.csv"with info for 600 students. The psychological variables are measures of control, self_concept and motivation. The academic variables are standardized tests in reading (read), writing (write), math (math) and science (science). Additionally, the variable sex codes female students with "1", and males with "0".

> mm <- read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
> colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math", "Science", "Sex")
> summary(mm)

First let us create subsets of variables (i.e., columns 1-3 for spychological measures, 4-7 for academic metrics) in which we designate row and column descriptors using the syntax of [rows,columns], a blank indicates the entire range

> psych_mm <- mm[,1:3]
> acad_mm <- mm[,4:7]
> sex_mm <- mm[,8]
> summary(acad_mm)

To subset for specific rows we can use a boolean that tests for a specific match, eg. all female students

> F = mm$Sex == 1
> female_mm <- mm[F,]
> M = mm$Sex == 0
> male_mm <- mm[M,]
> summary(male_mm)

... and yes, you guessed it, the psychological measures for all male students can then be obtained as ...

> psych_male_mm <- mm[M,1:3]
> summary(psych_male_mm)

Exercise 2:


> mm$addedMotivation <- mm$Motivation * 10
> mean(mm$addedMotivation[!is.na(mm$addedMotivation)])


Exercise 3: Read in a text file and substitute missing values with the mean

In R download and read the content from the cichlid brain data file

> CB <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/CichlidBrains.txt", sep="\t", header=T)

The following code iterates through all columns (variable counter is i) and tests whether the variable is of type numeric. If that is the case then the code inside the squiggly brackets is execute. In our case it prints the name of the variable to the console.

> for (i in which(sapply(CB, is.numeric))) { print(names(CB[i])) }

To access the values in a particular (in this case column 9) use ...

> for (i in 1:length(CB[,9])){ print(CB[i,9]) }

To substitute missing values in a single column (in this case column 9) with the value 3.14159 ...

> CB[9][is.na(CB[9])] <- 3.14159

... and yes, you can combine all these clauses in any way you wish. For instance to replace all missing values in any numerical variable of the data frame with the variables mean use ...

> for (i in which(sapply(CB, is.numeric))) { CB[i][is.na(CB[i])] <- mean(CB[,i], na.rm = T) }

Exercise 4: Define a function and use it to reformat a data file from column into stacked format

Define function make.rm()

> make.rm<-function(constant,repeated,data,contrasts) {
+  if(!missing(constant) && is.vector(constant)) {
+   if(!missing(repeated) && is.vector(repeated)) {
+    if(!missing(data)) {
+     dd<-dim(data)
+     replen<-length(repeated)
+     if(missing(contrasts))
+      contrasts<-
+       ordered(sapply(paste("T",1:length(repeated),sep=""),rep,dd[1]))
+     else
+      contrasts<-matrix(sapply(contrasts,rep,dd[1]),ncol=dim(contrasts)[2])
+     if(length(constant) == 1) cons.col<-rep(data[,constant],replen)
+     else cons.col<-lapply(data[,constant],rep,replen)
+     new.df<-data.frame(cons.col,
+      repdat=as.vector(data.matrix(data[,repeated])),
+      contrasts)
+     return(new.df)
+    }
+   }
+  }
+  cat("Usage: make.rm(constant, repeated, data [, contrasts])\n")
+  cat("\tWhere 'constant' is a vector of indices of non-repeated data and\n")
+  cat("\t'repeated' is a vector of indices of the repeated measures data.\n")
+ }

You can now use the make.rm function to reformat your data from column to stacked format. Read a data file that contains repeated measures as multiple columns

> groceries = read.table("http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/groceries.txt", header=T)
> groceries

> groceries.stacked <- make.rm(constant="subject", repeated=c("storeA","storeB","storeC","storeD"), data=groceries)
> groceries.stacked


last modified: 4/21/15
This material is copyrighted and MAY NOT be used for commercial purposes, 2001-2017 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]