## Advanced Statistics - Biology 6030 |

## Bowling Green State University, Fall 2017 |

**characterize the relationship**between a dependent (continuous) and an independent (nominal or ordinal) variable to determine the existence and strength of the association**characterize difference between sample means**to test whether they could have been drafted from the same underlying distribution

**ANOVA**:*F*=

Consider the situation where you wish to compare a series of **k** samples, each containing **n **values. You are hoping to statistically evaluate whether these samples could have all been derived from the same underlying distribution or whether this scenario is unlikely. You specifically test the H_{o}: µ_{1} = µ_{2} = µ_{3} ... = µ_{k}. To test the null hypothesis that k population means are equal, we will compare two different estimates of variance: one based on the variation of individual data points around their individual sample means [s^{2}_{(within)}], and the other based on an estimate of variance among sample means [s^{2}_{(between)}]. The logic behind this is that s^{2}_{(within)} is always an estimate of the true s^{2} (assuming that the samples have equal variance). In contrast, s^{2}_{(between)} is only an estimate of the true s^{2} if your H_{o} is correct. If we thus calculate the ratio of s^{2}_{(between)}/s^{2}_{(within)} then this value should be close to one under the null hypothesis. We can reject the null hypothesis if this ratio is particularly high, indicating that the variance estimate derived among the means is disproportionately large compared to that derived from the individual data points around their individual sample means.

**Step-by-step: **Note that when you collect small data sets from the same underlying, normal distribution, the means of these samples will all vary slightly due to chance differences in the actual values sampled. Also variances from different samples will differ from each other due to chance alone. As data points are normally distributed around their sample means with a given variance s^{2}=S(Y_{i}-)/n-1, so the sample means will be normally distributed around a mean of means with a given standard error SE = ^{s}/

**Synonyms**: Note that SS_{between} is also referred to as SS_{model} or SS_{regression}. SS_{within} is the same as SS_{residual} or SS_{error}.

**Independence**of datapoints**Normality**and <**Central Limit Theorem**> Distribution of sample means approaches a normal probability distribution as sample size increases, regardless of the shape of the population from which items are sampled. A sample size of 30 is often regarded as sufficient to employ the central limit theorem**Homoscedasticity**: Homogeneity of variances

Considering an ANOVA table, understand and develop an intuitive feeling for the derivation and meaning of all terms listed below:

**Coefficient of Determination**(r^{2}) is obtained as the variance ratio explained by the model: SS_{M}/ SS_{T}. The value varies from between 0 and 1 (i.e., 0-100%)**adjusted Coefficient of Determination**(r^{2}_{adj.}) is often more comparable across models with different numbers of parameters: 1 - (MS_{E}/MS_{T})**Correlation Coefficient**(r) non-directional measure of the association: SQRT(SS_{M}/ SS_{T})**Standard Deviation of the Residuals**or**Root Mean Square Error**(s) estimates the standard deviation of the random error. It is used in Power analysis and post-hoc tests: SQRT(MS_{E})**Raw Effect Size**(d) is estimated from population values: SQRT(SS_{M}/N)

**Worksheet:** ANOVA

last modified: 2/10/14

[ Advanced Statistics Course page | About BIO 6030 | Announcements ]

[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]