Clustering and Classification methods for Biologists


MMU logo

Discriminant Analysis

LTSN Bioscience logo

Page Outline

 

Search

[ Yahoo! ] options

Discriminant Analysis : A brief mathematical background

This example uses data from 10 males and 10 females. Three variables were recorded: height (inches!), weight (pounds!) and age (years).

SexHeightWeightAge
26714343
26613746
26412519
26513052
26915249
26814035
26210233
26613221
26711720
26812921
17117054
17216456
17216551
16816326
17217258
16613535
17119042
17417838
16815236
16917027

Data Summary

VariableMaleFemaleDifference
height70.366.24.1
weight165.9130.735.2
age42.333.98.4

A method is needed which is guaranteed to maximise the group differences, displayed by these three potentially discriminating variables, when they are combined into a single discriminating variable. This is achieved by calculating a Discriminant Function of the type:

score = w1height + w2weight + w3age

The problem is finding suitable values for wi.

First calculate the variance-covariance matrix A

  heightweightage
Let A = height5.2124.4910.68
 weight24.49207.7265.56
 age10.6864.56155.17

 

Let w be a vector containing the unknown weights (we need these for the Discriminant Function).:

w = [w1 w2 w3]

and d be a vector of the group differences (as shown above):

d = [ 4.1 35.2 8.4]

It can be shown that A.w = d

This is a relatively simple matrix algebra calculation since the equation be rewritten as w = A-1.d and A-1, the inverse of matrix A, is needed to solve the equation.

Using standardised variables.

w=weightvariable
 0.0029height
w = 1.028weight
 -0.096age

Hence;

discriminant score = 0.0029 height + 1.028 weight - 0.096 age

The group centroids ('mean scores') are females -1.23 and males +1.23

Consequently a positive score (>0) indicates a male, a negative score (<0) indicates a female. The means would not be symmetrical if the group sizes differed.

Back Back to Discriminant Analysis

next page Sample Analyses

top