Advanced Statistics - Biology 6030
|
Bowling Green State University, Fall 2019
|
Review: Probability
Probability
- <Probability>: likelihood that a particular event will occur relative to a total number of attempts (i.e., number of favorable outcomes divided by the total number of outcomes). 0 <= P(A) <= 1. Estimates about probabilities can be obtained in several ways:
- Random Events: Do truly random events exist in Biology? If probabilities for a particular outcome are sampled from an underlying theoretical distribution, then an estimate can be obtained through theoretical considerations (e.g., gamblers). RIP on a grave in Tombstone, NM: "Played 5 aces, now plays the harp"
- Deterministic Events:
- Reductionist Thinking: As we gain an understanding of precisely how a process controls a particular outcome (e.g. the genetics of a disease state such as Huntington's) our probability estimates for these outcomes become increasingly reliable and approaches the extremes of 0 or 1.
- Complex Systems: Even complete knowledge about all the individual elements and levels of control for a particular outcome will not allow us to predict future events, especially if they are contingent on past events. there is (e,g, violent behavior) due to necessary inacccuracies in our ability to define the precise starting point (e.g., kneading dough with a dab of color)
- <Empirical Probability Estimate>: In the absence of theoretical knowledge that determines an outcome we can use knowledge of past events serve as a predictor for future outcomes (e.g. insurance companies). Towards this goal we obtain the number of times an event has occurred in the past divided by the total number of observations.
Probability Distribution
Depending on the number of possible outcomes we can obtain measures for the likelihood of a particular outcome from a distribution. Such distributions can take on many shapes
- Binary outcome:
- Uniform Distribution (e.g., Flipping a coin): all values within a given range have an equal propability of occurring
- Ordinal outcome:
-
- Uniform Distribution (e.g., Rolling dice): all values within a given range have an equal propability of occurring.
- Ordinal oucomes are also generated by multiple, independent runs of binary outcomes.
- Binomial Distribution (n, p): discrete probability distribution (i.e., only specific values are possible) for obtaining exactly n successes out of N trials, where success occurs with a given probability. (e.g., 10 heads out of 20 tosses of a coin). Frequency data are discrete whereas normal distributions are applied to continuous data. However, as the mean count increases the normal distribution can provide a good approximation of count data.
- Poisson Distribution (l): the probability of events per interval, if events occur on average l times per interval. (e.g., predict the number of mutations per given time period or the number of trees within an area of a particular size). Poisson distribution best fits when sample size is large and the probability of occurrence small.
- Negative Binomial Distribution (r, p) (i.e., Pascal Distribution) is used when we are interested to find the number of failures that will likely occur before we reach a fixed number of successes (r). It gives the probability of r-1 successes and x failures in x+r-1 trials, and success on the (x+r)th trial. Useful for the analysis of rare events.
- Hypergeometric Distribution (N, R, k) describes a random selection (without replacement) from objects of two distinct types. It is described by three parameters: N, the total number of objects; R, the number of objects of the first type; and k the number of objects to be chosen. (e.g., choose a committee of 5 from a legislature consisting of 50 republicans and 50 democrats).
- Geometric distribution (p) What are the odds that you see a particular bird on a trip to the marsh. If X is the number of trips you need to make in order to find a partner. If you see one the first time out, then X=1. If you dont see one the first time but are in luck on your second try, then X=2, etc.
- Exponential Distribution: allows you to predict failure rates and estimate reliability, (e.g. mutations in a particular stretch of DNA)
- Continuous outcome: As the sample size for ordinal outcomes approaches infinity, probabilities can be obtained for an infinite number of possible outcomes. <Probability Density Function>: a function defined on a continuous interval so that the area under the curve (and above the x-axis) described by the function = 1.
- Normal Distribution (i.e., Gaussian Distribution): symmetric, bell-shaped, with scores concentrated around the mean compared to the tails. Probability function is described by two parameters: the mean (m) and the standard deviation (s)
Calculations
- Subsequent events are independent
- Sum Rule: the probability that at least one of several mutually exclusive categories will occurr is obtained by adding the individual probabilities
- Product Rule: the probability of independent events occurring together (joint and conditional probabilities) is obtained by multiplying the individual probabilities
- Multiple Outcomes
- Multiplications: Choice consists of multiple steps: If the first choice can be made in m ways and the second in n ways, then the total number of different choices = m * n
- Permutations: set of N objects taken r at a time where order is important - dog != god; with replacement:
; without replacement:
;
Combinations: set of N objects taken r at a time, where order is not important - dog == god; with replacement:
; without replacement. Binomial coefficient
refers to the number of ways one can pick k unordered outcomes out of n possibilities (from n choose k). The actual number is obtained by
. The coefficient forms the staggered rows in Pascal's (i.e., Yanghui's) triangle where each subsequent row is obtained by adding the two entries diagonally above.
Discrete Distributions
A variety of distributions are used to describe discrete outcomes that model numbers of events (e.g., coin-flipping, dice throwing, or the number and probabilities of DNA base pair changes in molecular biology.
- Uniform Distribution for two (e.g., flipping coin) or multiple possible outcomes (e.g., rolling dice) where all values within a given range have an equal propability of occurring.
- Bernoulli Trials: outcomes are generated from multiple, independent runs where probabilities stay constant and subsequent outcomes are independent.
- Binomial Distribution (n, p): for obtaining exactly n successes out of N trials, where success occurs with a given probability. (e.g., 10 heads out of 20 tosses of a coin).
- Poisson Distribution (l): the probability of events per interval in a given temporal or spatial dimension. If events occur on average l times per interval. (e.g., predict the number of mutations per given time period or the number of trees within an area of a particular size). Poisson distribution best fits when sample size is large and the probability of occurrence small.
- Negative Binomial Distribution (r, p) (i.e., Pascal Distribution) is used when we are interested to find the number of failures that will likely occur before we reach a fixed number of successes (r). It gives the probability of r-1 successes and x failures in x+r-1 trials, and success on the (x+r)th trial. Useful for the analysis of rare events.
- Hypergeometric Distribution (N, R, k) describes a random selection (without replacement) from objects of two distinct types. It is described by three parameters: N, the total number of objects; R, the number of objects of the first type; and k the number of objects to be chosen. (e.g., choose a committee of 5 from a legislature consisting of 50 republicans and 50 democrats). Typical examples include Choose a team of 8 from a group of 10 boys and 7 girls or from a legislature consisting of 52 Republicans and 48 Democrats. The probability function f(x) is f(x) = C(R,x)*C(N-R, k-x) / C(N,k) for x=max(0,k+R-N)..min(R,k) where C(n,k) is the binomial coefficient.
- Geometric distribution (p) What are the odds that you see a particular bird on a trip to the marsh. If X is the number of trips you need to make in order to find a partner. If you see one the first time out, then X=1. If you dont see one the first time but are in luck on your second try, then X=2, etc.
- Exponential Distribution: allows you to predict failure rates and estimate reliability, (e.g. mutations in a particular stretch of DNA)
- Weibull distribution: A family of distributions defined by a shape parameter and a location parameter. it is used extensively in reliability applications to model failure times, survival functions, inverse survival functions
Continuous Distribution
Continuous outcome: As the sample size for ordinal outcomes approaches infinity, probabilities can be obtained for an infinite number of possible outcomes. <Probability Density Function>: a function defined on a continuous interval so that the area under the curve (and above the x-axis) described by the function = 1.
- Normal Distribution (i.e., Gaussian Distribution): symmetric, bell-shaped, with scores concentrated around the mean compared to the tails. Probability function is described by two parameters: the mean (m) and the standard deviation (s). As sample size approaches infinity the outcome of Bernoulli trials approaches a normal distribution.
last modified: 01/25/02
This material is copyrighted and MAY NOT be used for commercial purposes, © 2001-2019 lobsterman.
[ Advanced Statistics Course page | About BIO 6030 | Announcements ]
[ Course syllabus | Exams & Grading | Glossary | Evaluations | Links ]