Lectures for Advanced Statistics

Multi-dimensional Scaling

Uses

find a way to "re-arrange" objects in an efficient manner based on similarities or dissimilarities
detect meaningful underlying dimensions that allow an efficient explanation of the observed distances
find a low-dimensional space, in which differences among objects match the true (i.e., higher-order) proximities among them as closely as possible
reduce the observed complexity of nature and explain the distance matrix in terms of fewer underlying dimensions - larger differences should be represented by larger distances

Examples

Analyze observer bias by determining similarities in measures obtained from different observers.
Examine similarities in food preference among a group of closely related species. What type of prey traits can explain the detected patterns? What does that tell us about sensory perceptions of predators?
Examine patterns in development or evolution of brains? In what way are brains shaped?
Assess the goodness of fit of proximity data to spatial distance models.

How this is done

MDS constructs a configuration of points in space from information about differences between the
points
Like cluster analysis this is a technique that makes no predictions but helps us interpret how different items relate to each other. It creates a lower dimensional representation of the data set where the original, miltidimensional distances are represented as closely as possible.
Given enough dimensions we can represent the distance among points exactly. A distance between two objects can be mapped with a single line segment, but an increasing number of objects demands a space with more and more dimensions (i.e., n-1 dimensions). This becomes difficult to visualize and interpret. Thus, the emphasis is on finding a spatial representation in a low-dimensional form.
Similar to factor analysis but it uses any general similarity measures instead of a specific covariance matrix. For each unit we need to know its nearest neighbor, and then the next, and so on in rank order.
Objects with known rank-differences between them are moved around within a given space (composed of a particular number of dimensions) until "lack of fit" has been minimized (i.e., goodness-of-fit has been maximized).
Examine how well the orginal distances between objects can be reproduced by a newly-derived configuration using stress measures (φ = Σ(δ_ij - δ_ij)². Stress calculates how well the actual distribution of distances (nearest neighbors) is represented by the current solution. The smaller this measure, the better the fit of the simplified MDS distance matrix to the observed, complex distance matrix.
A Scree test (Plot of stress value against number of dimensions in the solution) helps you decide on how many dimensions to use. Towards this goal pick the point where the graph begins to level off.
Interpret the dimensions through scatterplots of the objects in three-dimensions - rotate the space to visualize
Use multiple regression techniques to regress variables on the coordinates in different dimensions (MD scores)
Factor rotation: The actual orientation of axes in the final solution is arbitrary. For example, one can rotate a map, yet the distances between locations on it remain the same. You can rotate the factor space for improved interpretation by minimizing the number of variables with high loadings on each axis.

last modified: 4/14/10