Testing Populations

Testing_Populations

In this lesson, we will look at methods to test whether samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.

We are moving from data that was wholly categorical (i.e., counts of the number of individuals in different groups) in the previous lesson, to comparing continuous data that come from individuals in different groups or categories.

Our overarching question is whether the mean value of individuals in one group is different from the mean value of individuals from another group. Continuing with our previous example, we could ask whether female trees tend to be larger than male trees, on average.

As with the last lesson, we will progress from simple data and analyses to more complex data. First, we will consider comparing the mean of one sample to a known mean. Second, we will compare the means of two independent groups. Third, we will compare the means of paired samples. Finally, we will compare the means of more than two groups.

With all four comparisons, we will illustrate the analysis of parametric data (that follow a Normal, or Gaussian, distribution) and non-parametric (from a non-Normal distribution).

Data

We will use the data from the 2017 New Haven Road Race. Look at the first few rows of the data, in the race object.

head(race)

##   Place  No.                Name         City Div Sex Class Nettime_mins
## 1     1 4606        Jake Gurzler   Manchester   1   M 30-39        15.82
## 2     2 4384      Patrick Galvin     Stamford   1   M 20-29        15.85
## 3     3 4598       Aidan Pillard   Washington   2   M 20-29        15.92
## 4     4 3745          Omar Perez Poughkeepsie   3   M 20-29        15.98
## 5     5 5807 Timothy Milenkevich      Ansonia   2   M 30-39        16.20
## 6     6 3954     Tim Foldy-Porto  Northampton   1   M 13-19        16.37
##   Time_mins Pace_mins
## 1     15.82      5.10
## 2     15.87      5.12
## 3     15.93      5.13
## 4     16.00      5.15
## 5     16.23      5.23
## 6     16.38      5.28

Testing assumptions: Normality

The t-test and ANOVA assume that the data (or the residuals from the model) follow a Normal distribution. Lets test this assumption visually using a a histogram and/or a quantile-quantile plot, and statistically using a Shapiro-Wilk test (or Kolmogorov-Smirnov test).

Make a histogram of Pace (minutes).

hist(race$Pace_mins)

We appear to have a decent right-skew in our data, but lets diagnose using a few other methods.

The quantile-quantile plot

Plot a q-q plot of the $Pace_mins data (Remember the function qqnorm()).

qqnorm(race$Pace_mins)

The data look ok, but it can be hard to tell without a reference … let’s add a line to the plot using qqline()…

qqnorm(race$Pace_mins)
qqline(race$Pace_mins)

The data fall away from the line at both ends, suggesting the data are not normally disitributed. D’oh.

The Shapiro-Wilk test

Check if the $Pace column in the race data is normally distributed using a Shapiro-Wilk test of normality (we would expect not, given that it is a vector of integers from 1 to 2655 giving the final place in the race of each runner).

shapiro.test(race$Place)

## 
##  Shapiro-Wilk normality test
## 
## data:  race$Place
## W = 0.9549, p-value < 2.2e-16

Our data is clearly not normally distributed. This does not inherently mean that we can not use parametric tests for our data - the degree to which our data deviates from normality is more important. We should take caution in interpreting model results however. We will forge ahead anyway.

A. Compare the mean of one sample to a known mean

A1. Normal data: the one-sample t-test

First, we can test whether one sample is drawn from a population with a mean different from a known mean, using a one-sample t-test.

This known mean value could come from one of several sources. The mean could come from theory or a theoretical model; it could be data that come from a previous experiment or study; or from an experiment where you have a control and treatment conditions. If you calculate the difference between the treatment and control, you can test whether the mean % difference of the treatment differs significantly from 100.

Similar to the ratio and proportion tests, our null hypothesis would be one of the sample mean is equal to/greater then/less than the theoretical mean.

Let’s test if the mean net time for 2016 was different from the mean net time of 30 minutes and 56 seconds that we have for 2015.

As good data analysts, we should always examine the data before we do anything. Check a summary of the Nettime_mins column.

summary(race$Nettime_mins)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.82   25.23   29.18   30.22   33.85  107.33

And now we should plot the data, again, to look at the distribution and to check for odd data points and outliers. We have continuous data from a group, thus a boxplot is probably the best kind of plot. Using the boxplot() function, make a boxplot of theNettime_mins` column.

boxplot(race$Nettime_mins)

It looks like there are some folks who took a long time to finish the race, but the data look like they are reasonably equally distributed around the median (the central line of the box).

Assumptions: The t-test assumes that the data are normally-distributed. We can test this assumption visually with a histogram, q-q plot, and statistically with a Shapiro-Wilk test. First, make a histogram of $Nettime_mins.

hist(race$Nettime_mins)

The histogram looks decidedly right-skewed (i.e., there is a longer tail to the right), suggesting non-Normal data.

Now make a qqplot of the data.

qqnorm(race$Nettime_mins)

The data are curved, again suggesting some departure from Normality.

Now run a Shapiro-Wilk test on the data.

shapiro.test(race$Nettime_mins)

## 
##  Shapiro-Wilk normality test
## 
## data:  race$Nettime_mins
## W = 0.92426, p-value < 2.2e-16

Ok, so the data are not normal and the t-test is probably not appropriate. A large sample size can go some way to offsetting this assumption (large samples tend to approach normality, according to the Central Limit Theorem) However, we will proceed with these data to illustrate the test.

For a one-sample t-test, we use the function t.test and provide the data (x =), the known mean that we want to compare the data against (mu =), and the alternative hypothesis (’alternative = `).

Compare the $Nettime_mins to a mean of 30 minutes 58 seconds.

t.test(race$Nettime_mins, mu = 30.96)

## 
##  One Sample t-test
## 
## data:  race$Nettime_mins
## t = -5.0383, df = 2641, p-value = 5.014e-07
## alternative hypothesis: true mean is not equal to 30.96
## 95 percent confidence interval:
##  29.93720 30.51029
## sample estimates:
## mean of x 
##  30.22374

The p-value is less than 0.05, suggesting that we can reject the null hypothesis, and infer than people ran at a different time to 2015, on average.

A2. Non-normal data: one-sample Wilcoxon test

The one-sample Wilcoxon signed rank test is a non-parametric alternative to a one-sample t-test when the data does not follow a normal distribution. It is used to determine whether the median of the sample is equal to a specified value.

The one-sample Wilcoxon signed-rank test, wilcox.test(), requires the same same three arguments as the t-test: the data (x =), the specified value (mu =), and the alternative hypothesis (alternative =; two-sided is the default).

Test if the median of $Nettime_mins was less than a median time of 30 minutes.

wilcox.test(x = race$Nettime_mins, mu  = 30, alternative = 'less')

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  race$Nettime_mins
## V = 1610400, p-value = 0.000559
## alternative hypothesis: true location is less than 30

The p-value less than 0.05 suggests that we can reject the null hypothesis, and infer that people ran faster than this time.

B. Compare the mean of one sample to the mean of another sample

B1. Normal data: the two-sample t-test

The unpaired two-samples t-test is used to compare the mean of two independent groups. For example, we could ask if men or women ran the New Haven Road Race faster?

Assumptions: The t-test has several assumptions. The two samples are: (i) independent, (ii) normally distributed, and (iii) have equal variance.

Independent samples depends on good experimental design. We can test for normal distribution graphically (histogram, q-q plot) and statistically (Shapiro-Wilk test). We can check for equal variance with an F test.

First, draw a boxplot of the data. Use boxplot() to plot $Nettime_mins as a function of $Sex.

boxplot(race$Nettime_mins ~ race$Sex)

There are a few folks who didn’t fill in the box, but it looks like males tended to run faster than females.

Check that male $Nettime_mins are normally distributed.

shapiro.test(race$Nettime_mins[race$Sex == 'M'])

## 
##  Shapiro-Wilk normality test
## 
## data:  race$Nettime_mins[race$Sex == "M"]
## W = 0.87864, p-value < 2.2e-16

Check that female $Nettime_mins are normally distributed.

shapiro.test(race$Nettime_mins[race$Sex == 'F'])

## 
##  Shapiro-Wilk normality test
## 
## data:  race$Nettime_mins[race$Sex == "F"]
## W = 0.92101, p-value < 2.2e-16

Ok, well in this case, neither are normally distributed (the p-values are less than 0.05), and thus we should consider using the non-parametric version (wilcoxon test). No matter, let’s carry on and check for equal variance.

We can test for equal variance using the function var.test() that runs an F test to compare the variances of two samples from normal populations. var.test requires withx = andy = `, or a formula.

Compare the variances of the male and female runners, using x = and y =.

var.test(race$Nettime_mins[race$Sex == 'M'], race$Nettime_mins[race$Sex == 'F'])

## 
##  F test to compare two variances
## 
## data:  race$Nettime_mins[race$Sex == "M"] and race$Nettime_mins[race$Sex == "F"]
## F = 1.0407, num df = 1249, denom df = 1391, p-value = 0.4687
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.934212 1.159675
## sample estimates:
## ratio of variances 
##           1.040691

The p-value is greater than 0.05, suggesting that we cannot reject the null hypothesis of a difference between the variance of the two samples. Moreover, the t.test() function can also make a correction for unequal variances if needed.

Ok, lets proceed with a two-sample t-test.

As above, we use the t.test() function, and give it an extra data argument. We need to provide two data arguments (x = and y =), provide the alternative hypothesis (alternative =) if it is not the default of two-sided.

The t.test() function also includes a correction for unequal variance. By default, this correction is set to TRUE, so we need to disable it. If the variances are equal (as we found above), we need to set var.equal = TRUE.

Check if males ran faster than females using a t-test, using subsetting to pull out males for x and females for y.

t.test(race$Nettime_mins[race$Sex == 'M'], race$Nettime_mins[race$Sex == 'F'], alternative = 'less', var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  race$Nettime_mins[race$Sex == "M"] and race$Nettime_mins[race$Sex == "F"]
## t = -16.737, df = 2640, p-value < 2.2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##     -Inf -4.2006
## sample estimates:
## mean of x mean of y 
##  27.76925  32.42785

The p-value suggests that there is a significant difference between the mean values of male and female net time.

You can also use the formula method to set up the model within t.test(), modelling net time as a function of sex. Try that method with the data argument and the default alternative hypothesis.

t.test(Nettime_mins ~ Sex, data = race, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  Nettime_mins by Sex
## t = 16.737, df = 2640, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  4.112800 5.204408
## sample estimates:
## mean in group F mean in group M 
##        32.42785        27.76925

Hopefully you got a similar answer! We have extracted parts of the answer from these tests before, first assigning the result to an object. We can short-cut that process, however, and extract directly from the call to t.test(). Scroll up one step with the arrow, and add $p.value to the end of your previous t.test, to extract the p-value.

t.test(Nettime_mins ~ Sex, data = race, var.equal = TRUE)$p.value

## [1] 7.731017e-60

B2. Non-normal data: two-sample Wilcoxon test

The two-sample Wilcoxon test, wilcox.test(), parallels the two-sample t-test. As before, we need to provide two data arguments and specify if we want an alternative hypothesis different to a two-sided test. Again, we can use x = and y = data arguments, or use a formula.

Run a Wilcoxon test on net time of males and females, using the formula approach with a data argument, with the default alternative hypothesis.

wilcox.test(Nettime_mins ~ Sex, data = race)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Nettime_mins by Sex
## W = 1234400, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

The p-value less than 0.05 suggests that there is a difference between net time of males and females.

C. Compare the means of paired samples

If we have observations before and after a treatment, or of two matched subjects with different treatments, we can run a paired t-test. R relies on the relative position to determine the pairing, so make sure that this is correct.

Pairing the data gives us more statistical power, because random inbetween subject variation has been eliminated. But, we lose degrees of freedom because each pair is now the test unit, rather than the individual.

Again, we can run a parametric t-test, or a non-parametric Wilcoxon test. In both cases to run a paired test, set the argument paired = TRUE.

Look at the head of the rats dataset. It describes measurements taken from rats before and after some horrible experiment. [made up].

head(rats)

##   before after
## 1     20    23
## 2     15    16
## 3     10    10
## 4      5     4
## 5     20    22
## 6     15    15

Now, run a t-test on the rats data, setting x and y (but, t.test() has no data argument), and the default alternative hypthesis (i.e., the difference between before and after could be greater or lesser).

t.test(x = rats$after, y = rats$before, paired = TRUE)

## 
##  Paired t-test
## 
## data:  rats$after and rats$before
## t = 3.0502, df = 15, p-value = 0.0081
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2635612 1.4864388
## sample estimates:
## mean of the differences 
##                   0.875

D. Compare the means of more than two groups

In its simplest form, ANOVA is a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. It is akin to running multiple two-sample t-tests.

ANOVA has the following assumptions: (i) independence of observations, (ii) Normality - the distributions of the residuals are normal, (iii) Equality (or ‘homogeneity’) of variances (called homoscedasticity) - the variance of data in groups should be the same. We can test these assumptions using the tests described previously.

The R function aov() is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.

D1. One-way Analysis of Variance

One-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.

It tests the null hypothesis that the means of all groups are equal. Details of the specifics of ANOVA are described (here)[http://www.simonqueenborough.info/R/anova-explained.html].

If it returns a statistically significant result, we accept the alternative hypothesis that at least two group means are statistically significantly different from each other. However, ANOVA cannot tell you which specific group/s are significantly different from which other. To determine this, a post-hoc test must be carried out.

Let’s test different age groups in our race data and see which ran faster. First, make a boxplot of net time on age (Class), using a formula and data argument in boxplot().

boxplot(Nettime_mins ~ Class, data = race)

To run an ANOVA, we can use the function aov(), and give it a formula. In contrast to t.test(), wilcox.test() and other functions that take a maximum of two data arguments, those that can take more than two data arguments (e.g. in a two-way ANOVA, or multiple regression) always use a formula.

Run an ANOVA of net time on age class with a data argument.

aov(Nettime_mins ~ Class, data = race)

## Call:
##    aov(formula = Nettime_mins ~ Class, data = race)
## 
## Terms:
##                     Class Residuals
## Sum of Squares   16599.49 132403.32
## Deg. of Freedom        13      2628
## 
## Residual standard error: 7.098013
## Estimated effects may be unbalanced

R prints some items to the screen. We can do better. Assign the ANOVA code to an object, called m0 (our first model).

m0 <- aov(Nettime_mins ~ Class, data = race)

Now we can delve into the details of the output. Look at the structure of m0.

str(m0)

## List of 13
##  $ coefficients : Named num [1:14] 31.183 -3.457 -2.818 -1.054 -0.451 ...
##   ..- attr(*, "names")= chr [1:14] "(Intercept)" "Class13-19" "Class20-29" "Class30-39" ...
##  $ residuals    : Named num [1:2642] -14.3 -12.5 -12.4 -12.4 -13.9 ...
##   ..- attr(*, "names")= chr [1:2642] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:2642] -1553.5 -71.6 61.3 -34.3 -24 ...
##   ..- attr(*, "names")= chr [1:2642] "(Intercept)" "Class13-19" "Class20-29" "Class30-39" ...
##  $ rank         : int 14
##  $ fitted.values: Named num [1:2642] 30.1 28.4 28.4 28.4 30.1 ...
##   ..- attr(*, "names")= chr [1:2642] "1" "2" "3" "4" ...
##  $ assign       : int [1:14] 0 1 1 1 1 1 1 1 1 1 ...
##  $ qr           :List of 5
##   ..$ qr   : num [1:2642, 1:14] -51.4004 0.0195 0.0195 0.0195 0.0195 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2642] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:14] "(Intercept)" "Class13-19" "Class20-29" "Class30-39" ...
##   .. ..- attr(*, "assign")= int [1:14] 0 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "contrasts")=List of 1
##   .. .. ..$ Class: chr "contr.treatment"
##   ..$ qraux: num [1:14] 1.02 1.01 1.04 1 1 ...
##   ..$ pivot: int [1:14] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ tol  : num 1e-07
##   ..$ rank : int 14
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 2628
##  $ contrasts    :List of 1
##   ..$ Class: chr "contr.treatment"
##  $ xlevels      :List of 1
##   ..$ Class: chr [1:14] "01-12" "13-19" "20-29" "30-39" ...
##  $ call         : language aov(formula = Nettime_mins ~ Class, data = race)
##  $ terms        :Classes 'terms', 'formula'  language Nettime_mins ~ Class
##   .. ..- attr(*, "variables")= language list(Nettime_mins, Class)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "Nettime_mins" "Class"
##   .. .. .. ..$ : chr "Class"
##   .. ..- attr(*, "term.labels")= chr "Class"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(Nettime_mins, Class)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
##   .. .. ..- attr(*, "names")= chr [1:2] "Nettime_mins" "Class"
##  $ model        :'data.frame':   2642 obs. of  2 variables:
##   ..$ Nettime_mins: num [1:2642] 15.8 15.8 15.9 16 16.2 ...
##   ..$ Class       : Factor w/ 14 levels "01-12","13-19",..: 4 3 3 3 4 2 3 2 3 3 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language Nettime_mins ~ Class
##   .. .. ..- attr(*, "variables")= language list(Nettime_mins, Class)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "Nettime_mins" "Class"
##   .. .. .. .. ..$ : chr "Class"
##   .. .. ..- attr(*, "term.labels")= chr "Class"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(Nettime_mins, Class)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "Nettime_mins" "Class"
##  - attr(*, "class")= chr [1:2] "aov" "lm"

A list of 13 different things. It looks pretty complex. We can use the functionsummary() to pull out the main results in an easy to digest way. Do so.

summary(m0)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## Class         13  16599  1276.9   25.34 <2e-16 ***
## Residuals   2628 132403    50.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we have our traditional ANOVA table, with degrees of freedom (Df), sum of squares, etc. The p-value tells us that there are significant differences between age classes.

Ok. But what if we want to actually see the means of each category… We can use summary.lm() to return those. Try that.

summary.lm(m0)

## 
## Call:
## aov(formula = Nettime_mins ~ Class, data = race)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.799  -4.778  -0.961   3.603  75.822 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  31.1825     0.4701  66.335  < 2e-16 ***
## Class13-19   -3.4574     0.5490  -6.297 3.54e-10 ***
## Class20-29   -2.8177     0.5822  -4.840 1.38e-06 ***
## Class30-39   -1.0540     0.5998  -1.757  0.07900 .  
## Class40-44   -0.4509     0.6663  -0.677  0.49861    
## Class45-49    0.3377     0.6764   0.499  0.61759    
## Class50-54    0.3254     0.6797   0.479  0.63218    
## Class55-59    1.3265     0.7621   1.740  0.08190 .  
## Class60-64    1.9987     0.8513   2.348  0.01897 *  
## Class65-69    5.1361     1.0511   4.886 1.09e-06 ***
## Class70-74    8.1371     1.3994   5.815 6.81e-09 ***
## Class75-79    5.6756     2.1911   2.590  0.00964 ** 
## Class80-89   14.4129     2.1911   6.578 5.74e-11 ***
## Class90-99   21.8025     5.0410   4.325 1.58e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.098 on 2628 degrees of freedom
## Multiple R-squared:  0.1114, Adjusted R-squared:  0.107 
## F-statistic: 25.34 on 13 and 2628 DF,  p-value: < 2.2e-16

What is going on? In reality, aov() is a wrapper to lm() (the function for simple linear regression), i.e., the function aov() calls the function lm() for fitting linear models to balanced or unbalanced experimental designs. The main difference from lm() is in the way print(), summary() and so on handle the fit: This is expressed in the traditional language of the analysis of variance rather than that of linear models. This also means that it is easy to include both categorical and continuous variables as predictors in a model (we will come to this later).

What is returned by summary.lm() is the mean values of each category … almost. In fact, the values displayed are the differences between the base level (in this case 01-12) and each other level. So, the value of -3.4574 is the difference between 01-12 and 13-19. To calulate the actual value for age class 13-19, take the value for 01-12 (31.1825) and add -3.4575 (31.1825 + -3.4575 == 31.1825 - 3.4575 = 27.725).

The p-values are testing that this difference is 0. Thus, the p-value for age class 13-19 is less than 0.05, indicating that it is significantly different from 0 and therefore a different mean from the mean of age class 01-12.

This way of testing for differences is specific to R. Other software will have a different set of default contrasts. It is possible to change the contrasts in R (i.e., to test each group against the pooled grand mean, for example).

We can check the assumption of normal distribution of the residuals (not the data; once you move to linear models (ANOVA, lm, etc), it is the residuals that are important (specifically, in a t-test the data and residuals are the same), either graphically or with a shapiro-wilk test.

If you run plot() on the result of any model, it will return a series of diagnostic plots. To see only the q-q plot, run plot(m0, 2).

plot(m0, 2)

The residuals still fall off the 1:1 line, suggesting they are not normally-distributed. Age class has not accounted for this variation in the data. You can use the shapiro-wilk test to do this more formally.

We can test for homogeneity of variance another diagnostic plot. To display the plot of residuals on the fitted values, run plot(m0, 1).

plot(m0, 1)

This plot should show no relationship between the residuals and the fitted values. There is no obvious correlation here, although we may have lower variance at higher values, which may be a problem. You can use Levene’s or Bartlett’s test to examine this more formally.

A non-parametric alternative to ANOVA is the Kruskal-Wallis rank sum test, kruskal.test().

D2. Two-way ANOVA

A two-way ANOVA models the response as a function of two categorical variables, and can also include an interaction between them. It has the same assumptions as a one-way ANOVA.

We can run a two-way ANOVA easily, using aov(), and adding in another part to the formula. We will go over formulas in more detail later, but for now, know that you write them out in R much as you would algebraically. To model net time as a function of age class and sex, type: Nettime_mins ~ Class + Sex. Run that two-way ANOVA.

aov(Nettime_mins ~ Class + Sex, data = race)

## Call:
##    aov(formula = Nettime_mins ~ Class + Sex, data = race)
## 
## Terms:
##                     Class       Sex Residuals
## Sum of Squares   16599.49  19189.00 113214.32
## Deg. of Freedom        13         1      2627
## 
## Residual standard error: 6.564787
## Estimated effects may be unbalanced

This model is an additive model. The effect of age is equal across both sexes, and vice versa. You could allow the effects to vary across the other variable, by making the model multiplicative or including an interaction, by replacing the plus sign with a star (+ changes to *).

Anyway, let’s stick with our additive model, and assign it to m1. Do that.

m1 <- aov(Nettime_mins ~ Class + Sex, data = race)

Now look at the summary of m1.

summary(m1)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## Class         13  16599    1277   29.63 <2e-16 ***
## Sex            1  19189   19189  445.26 <2e-16 ***
## Residuals   2627 113214      43                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This displays the ANOVA table for the two-way model. It looks like there are significant differences among age class and sex.

D3. Post-hoc tests: TukeyHSD

To compare the significant difference across all the group means, Tukeys Honest Significant Differences test is often used. Tukey’s test is essentially a t-test, except that it corrects for the fact that you are making multiple comparisons (when making multiple comparisons, the chance of making a Type I error—a false positive—increases). Tukey’s test corrects for that, so is more suitable single test for multiple comparisons than running multiple t-tests.

To run a Tukey HSD test, use the function TukeyHSD() on the model object (in this case m1). Try that.

TukeyHSD(m1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Nettime_mins ~ Class + Sex, data = race)
## 
## $Class
##                    diff          lwr       upr     p adj
## 13-19-01-12 -3.45736814  -5.16195951 -1.752777 0.0000000
## 20-29-01-12 -2.81771950  -4.62524831 -1.010191 0.0000160
## 30-39-01-12 -1.05397637  -2.91614473  0.808192 0.8268390
## 40-44-01-12 -0.45090669  -2.51939096  1.617578 0.9999801
## 45-49-01-12  0.33773783  -1.76220760  2.437683 0.9999995
## 50-54-01-12  0.32539872  -1.78491047  2.435708 0.9999997
## 55-59-01-12  1.32645614  -1.03967280  3.692585 0.8365412
## 60-64-01-12  1.99865614  -0.64445507  4.641767 0.3816995
## 65-69-01-12  5.13605263   1.87270366  8.399402 0.0000123
## 70-74-01-12  8.13711131   3.79254852 12.481674 0.0000000
## 75-79-01-12  5.67563796  -1.12705898 12.478335 0.2245726
## 80-89-01-12 14.41291069   7.61021375 21.215608 0.0000000
## 90-99-01-12 21.80245614   6.15198429 37.452928 0.0002638
## 20-29-13-19  0.63964864  -0.74346868  2.022766 0.9576256
## 30-39-13-19  2.40339177   0.94959532  3.857188 0.0000028
## 40-44-13-19  3.00646145   1.29635025  4.716573 0.0000004
## 45-49-13-19  3.79510597   2.04707167  5.543140 0.0000000
## 50-54-13-19  3.78276686   2.02229590  5.543238 0.0000000
## 55-59-13-19  4.78382428   2.72362544  6.844023 0.0000000
## 60-64-13-19  5.45602428   3.08286394  7.829185 0.0000000
## 65-69-13-19  8.59342077   5.54460139 11.642240 0.0000000
## 70-74-13-19 11.59447945   7.40866066 15.780298 0.0000000
## 75-79-13-19  9.13300610   2.43057827 15.835434 0.0004260
## 80-89-13-19 17.87027883  11.16785100 24.572707 0.0000000
## 90-99-13-19 25.25982428   9.65267457 40.866974 0.0000054
## 30-39-20-29  1.76374314   0.19051308  3.336973 0.0125122
## 40-44-20-29  2.36681281   0.55407759  4.179548 0.0010126
## 45-49-20-29  3.15545733   1.30690314  5.004012 0.0000010
## 50-54-20-29  3.14311823   1.28279925  5.003437 0.0000014
## 55-59-20-29  4.14417564   1.99802841  6.290323 0.0000000
## 60-64-20-29  4.81637564   2.36822967  7.264522 0.0000000
## 65-69-20-29  7.95377214   4.84622837 11.061316 0.0000000
## 70-74-20-29 10.95483082   6.72604758 15.183614 0.0000000
## 75-79-20-29  8.49335746   1.76401370 15.222701 0.0019083
## 80-89-20-29 17.23063019  10.50128643 23.959974 0.0000000
## 90-99-20-29 24.62017564   9.00144808 40.238903 0.0000117
## 40-44-30-39  0.60306968  -1.26415275  2.470292 0.9984423
## 45-49-30-39  1.39171420  -0.51030137  3.293730 0.4394069
## 50-54-30-39  1.37937509  -0.53407658  3.292827 0.4658024
## 55-59-30-39  2.38043251   0.18806881  4.572796 0.0191767
## 60-64-30-39  3.05263251   0.56387191  5.541393 0.0031453
## 65-69-30-39  6.19002900   3.05038907  9.329669 0.0000000
## 70-74-30-39  9.19108768   4.93866270 13.443513 0.0000000
## 75-79-30-39  6.72961433  -0.01461121 13.473840 0.0511478
## 80-89-30-39 15.46688705   8.72266152 22.211113 0.0000000
## 90-99-30-39 22.85643251   7.23128734 38.481578 0.0000845
## 45-49-40-44  0.78864452  -1.31578399  2.893073 0.9932207
## 50-54-40-44  0.77630542  -1.33846490  2.891076 0.9944317
## 55-59-40-44  1.77736283  -0.59274576  4.147471 0.3961784
## 60-64-40-44  2.44956283  -0.19711158  5.096237 0.1043031
## 65-69-40-44  5.58695932   2.32072371  8.853195 0.0000009
## 70-74-40-44  8.58801800   4.24128654 12.934749 0.0000000
## 75-79-40-44  6.12654465  -0.67753752 12.930627 0.1307848
## 80-89-40-44 14.86381738   8.05973521 21.667900 0.0000000
## 90-99-40-44 22.25336283   6.60228882 37.904437 0.0001661
## 50-54-45-49 -0.01233911  -2.15789197  2.133214 1.0000000
## 55-59-45-49  0.98871831  -1.40889683  3.386333 0.9837532
## 60-64-45-49  1.66091831  -1.01041639  4.332253 0.7104247
## 65-69-45-49  4.79831480   1.51206486  8.084565 0.0000884
## 70-74-45-49  7.79937348   3.43758279 12.161164 0.0000002
## 75-79-45-49  5.33790013  -1.47581236 12.151613 0.3217010
## 80-89-45-49 14.07517286   7.26146037 20.888885 0.0000000
## 90-99-45-49 21.46471831   5.80945526 37.119981 0.0003734
## 55-59-50-54  1.00105742  -1.40564000  3.407755 0.9824768
## 60-64-50-54  1.67325742  -1.00623194  4.352747 0.7041320
## 65-69-50-54  4.81065391   1.51777177  8.103536 0.0000872
## 70-74-50-54  7.81171259   3.44492291 12.178502 0.0000002
## 75-79-50-54  5.35023923  -1.46667443 12.167153 0.3186675
## 80-89-50-54 14.08751196   7.27059830 20.904426 0.0000000
## 90-99-50-54 21.47705742   5.82040083 37.133714 0.0003695
## 60-64-55-59  0.67220000  -2.21307719  3.557477 0.9999568
## 65-69-55-59  3.80959649   0.34719356  7.271999 0.0161301
## 70-74-55-59  6.81065517   2.31465546 11.306655 0.0000348
## 75-79-55-59  4.34918182  -2.55121478 11.249578 0.6905004
## 80-89-55-59 13.08645455   6.18605795 19.986851 0.0000000
## 90-99-55-59 20.47600000   4.78281496 36.169185 0.0010260
## 65-69-60-64  3.13739649  -0.51988457  6.794678 0.1870038
## 70-74-60-64  6.13845517   1.49071549 10.786195 0.0008084
## 75-79-60-64  3.67698182  -3.32322846 10.677192 0.8922753
## 80-89-60-64 12.41425455   5.41404427 19.414465 0.0000003
## 90-99-60-64 19.80380000   4.06647090 35.541129 0.0020128
## 70-74-65-69  3.00105868  -2.02536089  8.027478 0.7653077
## 75-79-65-69  0.53958533  -6.71757345  7.796744 1.0000000
## 80-89-65-69  9.27685805   2.01969928 16.534017 0.0015123
## 90-99-65-69 16.66640351   0.81310931 32.519698 0.0285822
## 75-79-70-74 -2.46147335 -10.26481449  5.341868 0.9987801
## 80-89-70-74  6.27579937  -1.52754176 14.079141 0.2794115
## 90-99-70-74 13.66534483  -2.44529502 29.775985 0.2018126
## 80-89-75-79  8.73727273  -0.65919430 18.133740 0.1002056
## 90-99-75-79 16.12681818  -0.81290366 33.066540 0.0811725
## 90-99-80-89  7.38954545  -9.55017639 24.329267 0.9736619
## 
## $Sex
##          diff      lwr       upr p adj
## M-F -5.327116 -5.82872 -4.825513     0

This function displays the model and the results for each pairwise comparison (we have a lot of age classes, so the table is long!). It tells us the difference between the means of each comparison, some confidence intervals, and an adjusted p-value.

Well done! That was a whistle-stop tour of comparing the means of single, pairs, and multiple groups.

Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.

I am a robot.

I am a robot.