Advanced Statistics - Biology 6030
Bowling Green State University, Fall 2017
Review of Basics in Statistics
The Normal Distribution
Family of Distributions with the same general shape (i.e., Gaussian distributions, bell curves) which feature a single peak at the precise center of the distribution, are symmetrically concentrated in the middle and decrease the further you go into the tails. They reflect a binomial distribution at very large sample sizes. Normal distributions are centered on the population mean (m) and dispersed around it with a given population variance (s2). A Standard Normal Distribution is one that has the following parameters (m=0, s2=1, g1=0, g2=0). Any general normal distribution can be converted to a Standard Normal Distribution (m=0, s2=1) using the Standard Normal Deviate () or z-Score.
- Measures of Central Tendency
- Mean (arithmetic, 1st moment): sum of values divided by the number of values
- Median: midpoint of values after they have been arranged from highest to lowest (i.e., 50th percentile)
- Mode: midpoint of class interval with largest frequency (i.e., most common value and thus the highest point in a distribution). Suitable for all types of data including nominal but examine data for bi- or multimodality
- Measures of Dispersion
- Sum of squares
- Mean squares (Variance)
- Standard Deviation (2nd moment)
- Range and Interquartile Range (i.e., difference between the 75th and 25th percentile is an underutilized, stable measure of dispersion)
- Measures of Asymmetry
- Skewness (g1, 3rd moment): Exactly one half of all measures lies above teh mean, the other half is below. positive, negative skewness. Be concerned about skewness and kurtosis values >1 or <-1.
- Measures of Peakedness
- Kurtosis (g2, 4th moment): leptokurtic - high-peaked; mesokurtic - normal; platykurtic - flat-topped
- z-Tables list the area under the probability density function for a standard normal distribution
- Central limit theorem: explains why many distributions tend towards normality when the random variable being observed is the sum or mean of many independent identically distributed random variables.
- If multiple samples are obtained from a population their means will generally be normally distributed around the true underlying population mean (m). According to the Central limit theorem they will be distributed normally around it. Moreover, this is true regardless of the shape of the population from which items are sampled. The distribution of sample means approaches a normal probability distribution when sample size is sufficiently large (N >= 30).
- Standard Error of the Mean: Standard deviation for multiple sample means drawn from a particular population. Your confidence in how close your sample mean is to the underlying population mean varies with the sample's standard deviation and inversely with the sample size. SE = Var/N or SE = SD/SQRT(N);
- Confidence intervals: Range within which the population parameter is expected to fall for a given level of confidence
- individual measures (e.g., 95% µ ± 1.96s; 99% µ ± 2.58s). Plug in your sample estimates for mean and standard deviation
- sample means (SE = SD/SQRT(N); 95% µ ± 1.96 SE; 99% µ ± 2.58 SE)
- Compare a Sample Mean to a population mean in order to judge how likely it was derived from it.
- Paired t-Test: Calculate the distribution of differences between the paired measures and compare that distribution to a population mean of 0
- 1 sample t-test: t =
- 2 sample t-test:
- Additional graphics
- Estimated Probability Density Function
- Quantile Box Plot
- Normal Quantile Plot
last modified: 2/3/15
This material is copyrighted and MAY NOT be used for commercial purposes, © 2001-2017 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]