Discriminant Analysis : A brief mathematical background
This example uses data from 10 males and 10 females. Three variables were recorded: height (inches!), weight (pounds!) and age (years).
Sex | Height | Weight | Age |
---|---|---|---|
2 | 67 | 143 | 43 |
2 | 66 | 137 | 46 |
2 | 64 | 125 | 19 |
2 | 65 | 130 | 52 |
2 | 69 | 152 | 49 |
2 | 68 | 140 | 35 |
2 | 62 | 102 | 33 |
2 | 66 | 132 | 21 |
2 | 67 | 117 | 20 |
2 | 68 | 129 | 21 |
1 | 71 | 170 | 54 |
1 | 72 | 164 | 56 |
1 | 72 | 165 | 51 |
1 | 68 | 163 | 26 |
1 | 72 | 172 | 58 |
1 | 66 | 135 | 35 |
1 | 71 | 190 | 42 |
1 | 74 | 178 | 38 |
1 | 68 | 152 | 36 |
1 | 69 | 170 | 27 |
Data Summary
Variable | Male | Female | Difference |
---|---|---|---|
height | 70.3 | 66.2 | 4.1 |
weight | 165.9 | 130.7 | 35.2 |
age | 42.3 | 33.9 | 8.4 |
A method is needed which is guaranteed to maximise the group differences, displayed by these three potentially discriminating variables, when they are combined into a single discriminating variable. This is achieved by calculating a Discriminant Function of the type:
score = w1height + w2weight + w3age
The problem is finding suitable values for wi.
First calculate the variance-covariance matrix A
height | weight | age | ||
---|---|---|---|---|
Let A = | height | 5.21 | 24.49 | 10.68 |
weight | 24.49 | 207.72 | 65.56 | |
age | 10.68 | 64.56 | 155.17 |
Let w be a vector containing the unknown weights (we need these for the Discriminant Function).:
w = [w1 w2 w3]
and d be a vector of the group differences (as shown above):
d = [ 4.1 35.2 8.4]
It can be shown that A.w = d
This is a relatively simple matrix algebra calculation since the equation be rewritten as w = A-1.d and A-1, the inverse of matrix A, is needed to solve the equation.
Using standardised variables.
w= | weight | variable |
---|---|---|
0.0029 | height | |
w = | 1.028 | weight |
-0.096 | age |
Hence;
discriminant score = 0.0029 height + 1.028 weight - 0.096 age
The group centroids ('mean scores') are females -1.23 and males +1.23
Consequently a positive score (>0) indicates a male, a negative score (<0) indicates a female. The means would not be symmetrical if the group sizes differed.