Clustering and Classification methods for Biologists

MMU logo

Introduction and organisation

LTSN Bioscience logo

Page Outline



[ Yahoo! ] options

Intended Learning Outcomes

After completing this unit you should be able to:


It is assumed that you have some prior understanding of the basics of statistical analyses and interpretation.


Unit organisation

This unit is based for material developed for Masters students from the School of Biology, Chemistry and Health Science at Manchester Metropolitan University. Hopefully, it will also be a useful resource for other postgraduate, and final year undergraduate, students.

The unit examines the background to, and applications of, a range of clustering and classification techniques across biology.

Studying online

text block



The material is presented in five blocks.

The following two documents provide a broad context for the unit.

The analyses are not based on a particular statistical program. Examples are presented for a range of commercial and free packages. The data sets are provided in a range of formats that should enable them to be used with most packages.


Suggested bibliography

  1. Chatfield, C. and Collin, A. J. 1980. Introduction to multivariate analysis. Science Paperbacks.
  2. Field, A. 2000. Discovering Statistics using SPSS for Windows. Sage Publications, London. - an excellent comprehensive text about a wide range of 'difficult analyses.'
  3. Flury, B. and Riedwyl, H. 1988. Multivariate statistics: a practical approach. Chapman and Hall.
  4. Jongman, R. H. et al. 1995. Data analysis in community and landscape ecology. Pudoc Wageningen.
  5. Kinnear, P. R. and Gray, C. D. 2000. SPSS for Windows made simple. Psychology Press, Andover - £14.95 - an excellent and very clear book. (
  6. Legendre, P. and Legendre, L. 1998. Numerical Ecology (2nd English Edition). Elsevier, Amsterdam.
  7. Tabachnick, B. G. and Fidell, L. S. 1996. Using multivariate statistics. 3rd edition. Harper.

Do not treat this list as comprehensive. It is wise to search out other texts that you may find more suitable to your needs.


General web resources

The following Web sites contain links to free or shareware software, most of which are relevant to multivariate analyses.

  1. The makers of STATISTICA (a commercial software package) have a very useful set of notes about many statistical methods, including some that are only briefly covered in this course.
  2. Pierre Legendre's (Université de Montréal) site has links to many useful programs (particularly those involving spatial analyses). Much of this software is written for Apple Mac computers, but there are also some Window's versions.
  3. PopTools is a very versatile Excel addin from CSIRO. In addition to Mantel tests it also incorporates a range of Matrix methods and resampling techniques.
  4. The ADE-4 site is an online multivariate statistical package. You submit your data, it does the analyses and returns your results. You can also download the entire package to run on your own computer.
  5. The ordination methods for ecologists web site has links to many multivariate statistical techniques.
  6. PAST is a free data analysis package which, although aimed at paleontologists, has great potential for ecological analyses. In addition to many other techniques PAST can be used for Regression: Linear (Standard and Reduced Major Axis), lin-log (exponential), log-log (allometric), logistic; Diversity statistics, rarefaction. Dice, Jaccard and Raup-Crick similarity indices; Principal Components (with Minimal Spanning Tree), Principal Coordinates, Correspondence analysis with detrending, Cluster analysis (three algorithms, nine distance measures); Discriminant analysis; Time series and Spectral analysis; Directional statistics, rose plots, point distribution statistics
  7. The R package is a public domain (i.e. free) 'clone' of the very powerful S-Plus package. Although it is very powerful it is not for the faint-hearted! Using it belies its Unix heritage. If you wish to find a version for the Mac or PC follow the download link and choose the nearest site. Note this is a completely different R statistics package to that distributed from Pierre Legendre's site!
  8. Warren Kovach's MVSP software does most common multivariate analyses, including cluster analysis, PCA and PCO. The windows version also does CA and CCA. This is shareware software but you can try it before you buy it.
  9. Bill Miller has been developing a comprehensive and free statistics package call Openstat that offers a number of multivariate analyses including multiple regression, discriminant analysis cluster analysis, principal components (factor) analysis and logistic regression.
  10. WinIDAMS is a software package for the validation, manipulation and statistical analysis of data, developed by the UNESCO Secretariat. It has a wide range of techniques including regression analysis, one-way analysis of variance, discriminant analysis, cluster analysis, principal components factor analysis and correspondence analysis. It is distributed free-of-charge upon request.

Do not treat this list as comprehensive. If you discover another interesting site please let me know.