Advanced Statistics - Biology 6030

Bowling Green State University, Fall 2017

Lab Exercise 1 for R: Setup

Exercise 1: Chosing and Setting up the R framework

  1. Get the latest version of R and install to your computer from the downloaded package. Be aware that R is not very user friendly and takes a bit of a learning curve to master. A primer for doing stats with R is at Quick-R. Alternatively, you can also consult the Primer on using R, or Using R for Introductory Statistics.
  2. There are several excellent Graphical User Interfaces (GUI) for R, such as RStudio, RCommander, JGR, of RKWard, which make working with R much easier.
  3. There are also additional extensions that can be loaded from within R/JGR. Deducer will add a number of useful menus to R/JGR.

Go to the R console and install and load library RCommander, Deducer, etc ...

> install.packages("Rcmdr");
> library (Rcmdr)

> install.packages("Deducer");
> library (Deducer)

Exercise 2: Open R, read in a text file and do some basic data management

There are a bunch of ways in which we can slice and dice a data file. Here are some concepts that apply.

In R download and read the content from the UCLA file "mmreg.csv"with info for 600 students. The psychological variables are measures of control, self_concept and motivation. The academic variables are standardized tests in reading (read), writing (write), math (math) and science (science). Additionally, the variable sex codes female students with "1", and males with "0".

> mm <- read.csv("http://stats.idre.ucla.edu/stat/data/mmreg.csv")
> summary(mm)

First let us create subsets of variables (i.e., columns 1-3 for spychological measures, 4-7 for academic metrics) in which we designate row and column descriptors using the syntax of [rows,columns], a blank indicates the entire range

> psych_mm <- mm[,1:3]
> acad_mm <- mm[,4:7]
> sex_mm <- mm[,8]
> summary(acad_mm)

To subset for specific rows we can use a boolean that tests for a specific match, eg. all female students

> F = mm$female == 1
> female_mm <- mm[F,]
> M = mm$female == 0
> male_mm <- mm[M,]
> summary(male_mm)

... and yes, you guessed it, the psychological measures for all male students can then be obtained as ...

> psych_male_mm <- mm[M,1:3]
> summary(psych_male_mm)

Exercise 3: Read in a text file and substitute missing values with the mean

In same cases it may be appropriate to replace a missing value with the mean for the variable. You can go into your text file with a text editor and fille the missing valuews with the mean, or, alternatively, you can do this in R.

In R download and read the content from the cichlid brain data file

> CB <- read.table("http://caspar.bgsu.edu/~courses/stats/Labs/Datasets/CichlidBrains.txt", sep="\t", header=T)

The following code iterates through all columns (variable counter is i) and tests whether the variable is of type numeric. If that is the case then the code inside the squiggly brackets is execute. In our case it prints the name of the variable to the console.

> for (i in which(sapply(CB, is.numeric))) { print(names(CB[i])) }

To access the values in a particular (in this case column 9) use ...

> for (i in 1:length(CB[,9])){ print(CB[i,9]) }

To substitute missing values in a single column (in this case column 9) with the value 3.14159 ...

> CB[9][is.na(CB[9])] <- 3.14159

... and yes, you can combine all these clauses in any way you wish. For instance to replace all missing values in any numerical variable of the data frame with the variables mean use ...

> for (i in which(sapply(CB, is.numeric))) { CB[i][is.na(CB[i])] <- mean(CB[,i], na.rm = T) }


last modified: 3/24/15
This material is copyrighted and MAY NOT be used for commercial purposes, 2001-2017 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]