In this lesson, we’ll cover ordination methods. These are methods that help us summarize multivariate data, and detect potential associations between many variables.

We often have large data sets of different observations or measurements from the same sampling units or sites.

For example, we may have a network of quadrats, plots, or sites with incidence or abundance data on taxa present at each site, along with environmental variables (soil nutrients, elevation, aspect, …). Or, we may have measurements of many different variables from the same individuals or sampling units (e.g., individual sparrows).

There are several reasons to use ordination:

  1. Reducing the number of potentially correlated dependent variables (e.g., soil nutrients, multiple variables on the same individuals).

  2. Modelling occurance of a large number of species that would be hard to model individually.

There are a large number of different ordination methods that are each slightly and subtley different, and best-used in specific situations. These methods include principle components analysis, principle coordinates analysis, non-metric multi-dimensional scaling, detrended correspondance analysis, canonical correspondance analysis, …

Basic ordination methods are available in base R, and there are a number of specialist packages that extend these capabilities, including the vegan and labdsv package.

Now, on to an actual example!

First, look at the head of the sparrow dataframe we’ve loaded with the lesson for you.

head(sparrow)
##   Species    Sex Wingcrd Tarsus Head Culmen Nalospi   Wt Observer Age
## 1    SSTS   Male    58.0   21.7 32.7   13.9    10.2 20.3        2   0
## 2    SSTS Female    56.5   21.1 31.4   12.2    10.1 17.4        2   0
## 3    SSTS   Male    59.0   21.0 33.3   13.8    10.0 21.0        2   0
## 4    SSTS   Male    59.0   21.3 32.5   13.2     9.9 21.0        2   0
## 5    SSTS   Male    57.0   21.0 32.5   13.8     9.9 19.8        2   0
## 6    SSTS Female    57.0   20.7 32.5   13.3     9.9 17.5        2   0

Base R (in the stats package) has two functions to calculate a Principle Components Analysis (PCA), prcomp() and princomp().

Most multivariate analyses also require you to decide whether you want to scale the variables or not. As with multiple regression, this makes it easier to make direct comparisons between variables. In the case of prcomp(), the argument scale = takes a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable.

Now run prcomp on the 3rd through the 8th columns of the sparrow data, and assign it to the object pca1. Be sure to set the scale argument to TRUE so that our variables have roughly equal variance.

pca1 <- prcomp(sparrow[, 3:8], scale = TRUE)

Lets visualize your PCA. We can do so in multiple ways. First, we can look at how much variance is explained by each axis, and what variables are driving that.

Plot pca1 now.

plot(pca1)

Each bar represents our principle components axes, ordered by the principle component explaining the most variation in our data, to the least.

Here we can see that Axis 1 explains the majority of variation in our data, indicating that a handful of correlated variables are explaining our data.

Enter pca1 to see the weights of each of your variables in each principle components axis.

pca1
## Standard deviations (1, .., p=6):
## [1] 1.9952947 0.8924830 0.6077801 0.5767803 0.5350063 0.4837036
## 
## Rotation (n x k) = (6 x 6):
##                PC1         PC2         PC3           PC4         PC5
## Wingcrd -0.3617745  0.63271912  0.50662096  0.3258822675 -0.31354286
## Tarsus  -0.4214011  0.07322892 -0.75093073  0.4624788082 -0.04496309
## Head    -0.4459458 -0.19032228  0.01334900 -0.0002966949 -0.09030289
## Culmen  -0.4099956 -0.39641642  0.38030979  0.2694261583  0.63356601
## Nalospi -0.4049720 -0.46052289  0.09015465 -0.3618441591 -0.58920702
## Wt      -0.4007168  0.43457354 -0.16277793 -0.6902118236  0.37807906
##                 PC6
## Wingcrd -0.08725000
## Tarsus  -0.19301118
## Head     0.86981428
## Culmen  -0.23690880
## Nalospi -0.37106919
## Wt      -0.06884131

It appears that all of the variables have roughly equal weight in Axis 1. This makes sense given that they’re all anatomical variables, and thus highly correlated.

We could use the columns PC1 (and maybe PC2) as predictors in a regular lm() or glm() instead of a single anatomical variable, to account for correlation between all the variables (this is often done for soil nutrient data).

Often, a plot of the first two principle components axes is an effective way to represent the multi-dimensional correlation of our variables. The function biplot() depicts PC1 and PC2 points for each sparrow and adds arrows corresponding to the direction and strength of each variable.

Create a biplot of our pca1 object now.

biplot(pca1)