Clustering and Classification methods for Biologists


MMU logo

Discriminant Analysis

LTSN Bioscience logo

Page Outline

 

Search

[ Yahoo! ] options

Stepwise Statistics

The previous analysis is repeated but using a stepwise variable selection procedure. Note that this suffers from the same problems associated with all variable selection procedures.

A range of slection methods are available. This example uses the SPSS default method based on a minimization of Wilk's lambda.

In this analysis pyruvate kinase was entered first, followed by hemopexin. No other variables were entered because they would not significantly improve the discrimination between the groups. Using the F values and correlation coefficients that you noted earlier can you understand why only these two variables were selected? (see below for the explanation).

Variables Entered/Removed(a,b,c,d)
  Entered Wilks' Lambda
Statistic df1 df2 df3 Exact F
Step  
   
1 pyruvate kinase .631 1 1 71.000 41.458 1 71.000 .000
2 hemopexin .524 2 1 71.000 31.781 2 70.000 .000
At each step, the variable that minimizes the overall Wilks' Lambda is entered.
a Maximum number of steps is 8.
b Minimum partial F to enter is 3.84.
c Maximum partial F to remove is 2.71.
d F level, tolerance, or VIN insufficient for further computation.

 

Recall that pyruvate had the largest F value, and hence the 2 groups differed most with respect to this variable. Hemopexin had the second largest F value and it was also uncorrelated with pyruvate. The other two variables were both correlated with pyruvate, hence they could do little to improve the seperation of the groups.

A summary of the final model shows that at step 1 the pyruvate kinase predictor was selected. In the second, and last step, hemopexin was added.

Variables in the Analysis
Step Tolerance F to Remove Wilks' Lambda
1 pyruvate kinase 1.000 41.458  
2 pyruvate kinase 0.995 26.142 0.720
hemopexin 0.995 14.324 0.631

 

The next table is a list of predictors that were not used during each step. Note that at step 0 none were used but the 'F to Enter' statistics are used to rank them.

Variables Not in the Analysis (at each step)
Step Tolerance Min. Tolerance F to Enter Wilks' Lambda
0 creatine kinase 1.000 1.000 18.958 0.789
hemopexin 1.000 1.000 27.634 0.720
lactate dehydrogenase 1.000 1.000 16.258 0.814
pyruvate kinase 1.000 1.000 41.458 0.631
1 creatine kinase 0.857 0.857 2.682 0.608
hemopexin 0.995 0.995 14.324 0.524
lactate dehydrogenase 0.754 0.754 0.583 0.626
2 creatine kinase 0.854 0.850 2.858 0.503
lactate dehydrogenase 0.751 0.748 0.812 0.518

 

Wilks' Lambda
  Number of Variables Lambda df1 df2 df3 Exact F
Step  
   
1 1 .631 1 1 71 41.458 1 71.000 <0.0001
2 2 .524 2 1 71 31.781 2 70.000 <0.0001

 


top

Summary of Canonical Discriminant Functions

The remainder of the output is similar to the full model, so only the additional aspects are described further.

Eigenvalues
Function Eigenvalue % of Variance Cumulative % Canonical Correlation
1 0.908(a) 100.0 100.0 0.690
a First 1 canonical discriminant functions were used in the analysis.

 

Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 0.524 45.225 2 0.000

 

Standardized Canonical Discriminant Function Coefficients
  Function
1
hemopexin 0.599
pyruvate kinase 0.758

 

Although only two predictors were used in the discriminat function it is still possible for others to be correlated with the discriminant score. From the correlation coefficients listed below it is apparent that the difference between the groups is largely due to pyruvate and hemopexin values.

Structure Matrix
  Function
1
pyruvate kinase 0.802
hemopexin 0.655
lactate dehydrogenase(a) 0.366
creatine kinase(a) 0.270
Pooled within-groups correlations between discriminating variables and
standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
a This variable not used in the analysis.

 

Functions at Group Centroids
  Function
Duchenne Muscular Dystrophy 1
NonCarrier -0.877
Carrier 1.006

 

top


Classification Statistics

Prior Probabilities for Groups
  Prior Cases Used in Analysis
Duchenne Muscular Dystrophy   Unweighted
NonCarrier 0.500 39 39.000
Carrier 0.500 34 34.000
Total 1.000 73 73.000

 

Classification Results(b,c)
  Predicted Group
Membership
Total
    Duchenne Muscular
Dystrophy
NonCarrier Carrier
Original Count NonCarrier 34 5 39
Carrier 5 29 34
% NonCarrier 87.2 12.8 100.0
Carrier 14.7 85.3 100.0
Cross-
validated(a)
Count NonCarrier 33 6 39
Carrier 5 29 34
% NonCarrier 84.6 15.4 100.0
Carrier 14.7 85.3 100.0
a Cross validation is done only for those cases in the analysis. In cross validation,
each case is classified by the functions derived from all cases other than that case.
b 86.3% of original grouped cases correctly classified.
c 84.9% of cross-validated grouped cases correctly classified.

Despite using fewer predictors there has only been a marginal decline in prediction accuracy. Indeed it could be argued that it has got better since fewer carriers are misclassified.

Back to DA examples Back to DA examples

Back Back to Discriminant Analysis