Outline of multivariate methods
Background
The following two documents provide a broad context for the unit. The other links introduce the topics covered in the unit and link to more detailed pages.
- A short review of multivariate methods
- A short introduction to classification and clustering methods
Cluster Analysis
Detailed description of Cluster Analysis methods
Cluster analysis is broad collection of methods that are used to group data into classes that share similar characteristics. Formal significance tests are generally not used, instead the analysis is judged by the 'quality' of the outcome, i.e. how useful you find the results.
Principal Components Analysis (PCA)
PCA is a dimension reduction technique that exploits the correlations between variables to derive a smaller set of components (composite variables) that retain a large proportion of the original information in fewer dimensions.
Discriminant Analysis
Detailed description of Discriminant Analysis
Discriminant analysis is a technique that can be used to (a) find how how two or more classes differ with respect to a set of predictor variables and/or (b) predict the class of an object from the values of its predictor variables. The outcome, but not the algorithm, is similar to logistic regression.
Logistic Regression
Detailed description of Logistic Regression.
Logistic regression is a type of generalised linear model that is typically used to model the relationship between a binary (0/1) response variable and one or more predictor variables. In many, but not all, analyses it is equivalent to using discriminant analysis.
Generalised additive models
Description of, and example analysis using, a Generalised Additive Model.
Generalised Additive Models (GAM) are related to the generalised linear model (e.g. logistic regression. However, these are not fully parametric models because the regression coefficients are replaced by non-parametric smoothing functions which model, to a user-defined level of complexity, the relationships between the class variable and the predictors.
Decision Trees
Detailed description of Decision Trees
Decision trees predict the class of an object by a series of binary (usually) decisions. In many respects they are similar to the familiar species identification keys. The decisions identify thresholds that maximally separate groups. A more recent, and more robust, decision tree algorithm is known as a randomForest.
Artificial neural networks
Description of Artificial Neural Networks
Artificial neural networks belong to a class of methods variously known as parallel distributed processing or connectionist techniques. They are an attempt to simulate a real neural network, which is composed of a large number of interconnected, but independent, neurones. However, most artificial neural networks are simulated since they are implemented in software on a single CPU.It is generally considered that neural networks do well when data structures are not well understood. This is because they are able to combine and transform raw data without any user input. One disadvantage of most artificial neural networks is that the learned relationships are distributed amongst the connections; this makes them potentially difficult to interpret.
Other methods
Link to other multivariate methods
This page contains links to a variety of other methods, covered in less detail. They are not core to this resource but may be useful in your studies.
Measuring accuracy
Measuring the accuracy of predictions
This page contains a description of problems and solutions associated with the measurement of prediction accuracy in a technique such as logistic regression (or any method that makes binary predictions).