Factor Analysis

In this analysis all five variables are subjected to a PCA (SPSS Factor Analysis). Five components are extracted and there is no rotation of the axes. Factor scores have been saved.

Because all five components (same as the number of variables) are extracted the communalities are 1.0.

Communalities
  Initial Extraction
AGE 1.000 1.000
creatine kinase 1.000 1.000
hemopexin 1.000 1.000
lactate dehydrogenase 1.000 1.000
pyruvate kinase 1.000 1.000
Extraction Method: Principal Component Analysis.

 

Total Variance Explained

Initial Eigenvalues Extraction Sums of Squared Loadings
Component Total % of Variance Cumulative % Total % of Variance Cumulative %
1 2.576 51.515 51.515 2.576 51.515 51.515
2 1.189 23.779 75.295 1.189 23.779 75.295
3 .630 12.594 87.889 .630 12.594 87.889
4 .445 8.909 96.798 .445 8.909 96.798
5 .160 3.202 100.000 .160 3.202 100.000
Extraction Method: Principal Component Analysis.

The analysis seems to be potentially useful because 75% of the variability is retained by the first two components. In other words the dimensionality of the data is somewhat less than the original 5.

Because there is a large decrease in the value of the 3rd eigen value, and only the first two have eigen values > 1.0, we will concentrate on these two components. The other three sets of loadings have been 'grayed out' to help in the interpretation.

Component Matrix(a)
  Component
1 2 3 4 5
AGE 0.402 0.739 0.536 0.048 0.051
creatine kinase 0.851 -0.325 0.141 0.290 -0.256
hemopexin 0.525 0.611 -0.564 0.181 -0.001
lactate dehydrogenase 0.857 -0.405 0.014 0.116 0.296
pyruvate kinase 0.824 0.008 -0.061 -0.559 -0.068
Extraction Method: Principal Component Analysis.
a 5 components extracted.

All five variables are loaded onto the first component. This is not uncommon in a PCA. It is a measure of the overall variability. Traditionally the first component is viewed as a 'size' or 'magnitude' component. The second component is everything except pyruvate kinase. However, the four variables are split into two groups.

  1. Age and hemopexin: they have positive loadings (correlations), hence large scores on component 2 tend to be associated with larger values for age and hemopexin.
  2. Lactate dehydrogenase and creatine kinase: they have negative loadings (correlations), hence large scores on component 2 tend to be associated with smaller values for lactate dehydrogenase and creatine kinase.

Go back to the data description and examine the correlation coefficients. How do these relate to the structure of PC1 & PC2?

Now we will see what, if anything, the component scores tell us about the differences between carriers and non-carriers.

Plot of PC1 v PC2

Several things are apparent from this plot.

  1. There is some separation of the two groups
  2. Carriers tend to have larger values for PC1 & PC2. How do these relate to the original variables (Hint look at the loadings).
  3. The carriers are much more heterogeneous, with some obvious outliers.

Back to PCA Example 2 menu.

Next analysis on these data

Valid HTML 4.01!