Lectures for Advanced Statistics

Curve Fitting with Parametric Regression Analysis

Is a straight line the model that is best suited to describe our data? Or is there something to be gained by including curves in our equation? An additional curve is added to the model when an additional polynomial term of the predictor variable is included in the equation. Y modeled on X produces a straight line model, Y modeled on X + X² will test for a curved model, Y modeled on X + X² and X³ gives a model with two curves. These Curves are represented by adding a higher order polynomial term to the equation.
The fit of the model (i.e., the SS_M) will certainly increase with each additional polynomial term, however, it is not clear whether the new equation is significantly better? Remember we loose one df for each term added to our equation, so our MS_M model term in the numerator for the F-statistic may or may not increase.

Build a linear model of the data by regressing Y on X
Center the independent variable X by subtracting the mean for X from each value in it -> Xc. The main purpose of this is to reduce collinearity between the independent variables.
Create the polynomial terms by multiplying each value in Xc with itself one time (quadratic term), 2 times (cubic term), etc.
Build regression models of increasing complexity by including additonal polynomials terms as predictor variables
Test whether a higher degree model significantly improves the model's fit
- calculate F = (SS_M for higher degree model - SS_M for lower degree model) / (MS_E for higher degree model)
- compare to F-Tables with numerator df = 1 and denominator df = residual df of the higher degree model

you can always run this analysis with raw data, standardized (i.e, z-transform), centered (i.e., subtract mean) or on ranked data to make sure they give you essentially similar results.

last modified: 2/1/08