Here, we illustrate the variety of different graphs possible in R. These are all done with the base R install. Additional packages such as lattice and ggplot2 provide more options and could be looked at once you are familiar with these.

*Functions:* hist(), barplot(), pie(), stripchart(), boxplot(), dotchart(), coplot()

*Packages:* vioplot, plotrix

A. Foundational Knowledge

- Modify default plots using the graphical parameters function

B. Application

- Make plots according to specific spublisher or thesis requirements

C. Integration & Human Dimension

- Understand the presentation of data
- Consider how best to present your data to tell its story

Bird data.

```
BirdData <- data.frame(
Tarsus = c(22.3, 19.7, 20.8, 20.3, 20.8, 21.5, 20.6, 21.5),
Head = c(31.2, 30.4, 30.6, 30.3, 30.3, 30.8, 32.5, 31.6),
Weight = c(9.5, 13.8, 14.8, 15.2, 15.5, 15.6, 15.6, 15.7),
Wingcrd = c(59, 55, 53.5, 55, 52.5, 57.5, 53, 55),
Species = c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'B')
)
```

Choosing the most appropriate graph for the data is one of the most important decisions you can make in presenting your results.

Here, we will look at some basic graphs, based on whether the x (predictor/independent variable) and y (response/dependent variable) are categorical or continuous:

Single continuous

Predictor and response are continuous

Single categorical

Categorical predictor and continous response

Stephen Few has a great Graph Selection Matrix

For each graph type, we will look at the default appearance and some of the more useful arguments we can alter.

**data:** single continuous variable

**use:** illustrating counts or distributions of data

**example:** bird data

Histograms present the frequency (count) or probability density distribution of a vector of data.

`breaks`

describes the number of bins or breakpoints between bins.

`freq = TRUE`

defaults to counts; set to`FALSE`

to plot probability distribution.

`plot = TRUE`

default will plot the histogram. Set to`FALSE`

to not plot, and store the counts in an object for future use.

```
# set layout and margins
par(mfrow = c(2, 2), mar = c(3, 3, 2, 1), lwd = 2)
hist(BirdData$Head, main = "default histogram")
hist(BirdData$Head, breaks = 10, main = "breaks = 10")
hist(BirdData$Head, breaks = seq(from = 30, to = 33, by = 0.25), main = "breaks = seq()" )
# plot probability density distribution
hist(BirdData$Head, breaks = seq(from = 30, to = 33, by = 0.25), freq = FALSE, main = "probability density" )
```

**data:** continuous response; continuous predictor.

**use:** illustrating relationships and correlations between continuous variables, …

**example:** BirdData

We have already used scatter plots (see Advanced Graphics.

Here, however, we add a legend to two of the panels, using `legend()`

.

```
par(mfrow = c(2, 2), mar = c(4, 4, 2, 1), lwd = 2)
plot(Head ~ Tarsus, data = BirdData,
xlab = 'Tarsus (mm)',
ylab = 'Head Size (mm)',
main = 'Head vs Tarsus',
pch = 20,
col = Species,
cex = 3)
plot(Head ~ Wingcrd, data = BirdData,
xlab = 'Wingcrd (m)',
ylab = 'Head Size (mm)',
main = 'Head vs Wing',
pch = 20,
col = Species,
cex = 3)
plot(Head ~ Weight, data = BirdData,
xlab = 'Weight (kg)',
ylab = 'Head Size (mm)',
main = 'Head vs Weight',
pch = 20,
col = Species,
cex = 3)
# Here, we add a legend to this plot
legend('topleft',
pch = c(20, 20), col = c(1, 2),
legend = c('Species A', 'Species B')
)
plot(Head ~ Weight, data = BirdData,
xlab = 'Species',
ylab = 'Head Size (mm)',
main = 'Head vs Species',
pch = c(20, 23),
col = 1:2,
cex = Tarsus/5)
# Another legend
legend('topleft',
pch = c(20, 23), col = c(1, 2),
legend = c('Species A', 'Species B')
)
```

A pie chart is a circular representation of counts or proportion, with the angle corresponding to the value of each category.

**data**: usually counts or proportion response; categorical predictor

**use:** never! (unless it involves real pie)

**example:** BirdFlu

We have annual data from the years 2003-2008. Has the total number of bird flu cases has increased over time?

Load the data and select only cases columns.

```
BFdata <- read.table(file = 'http://www.simonqueenborough.info/R/basic/data/birdflu_corrected.txt', header = TRUE, sep = '\t')
# select cases columns only
BFcases <- BFdata[, c('cases03', 'cases04', 'cases05', 'cases06', 'cases07', 'cases08')]
names(BFdata)
```

```
## [1] "Country" "cases03" "deaths03" "cases04" "deaths04" "cases05"
## [7] "deaths05" "cases06" "deaths06" "cases07" "deaths07" "cases08"
## [13] "deaths08"
```

`str(BFdata)`

```
## 'data.frame': 15 obs. of 13 variables:
## $ Country : Factor w/ 15 levels "Azerbaijan","Bangladesh",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ cases03 : int 0 0 0 1 0 0 0 0 0 0 ...
## $ deaths03: int 0 0 0 1 0 0 0 0 0 0 ...
## $ cases04 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ deaths04: int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases05 : int 0 0 4 8 0 0 20 0 0 0 ...
## $ deaths05: int 0 0 4 5 0 0 13 0 0 0 ...
## $ cases06 : int 8 0 2 13 1 18 55 3 0 0 ...
## $ deaths06: int 5 0 2 8 0 10 45 2 0 0 ...
## $ cases07 : int 0 0 1 5 0 25 42 0 2 1 ...
## $ deaths07: int 0 0 1 3 0 9 37 0 2 0 ...
## $ cases08 : int 0 1 0 3 0 7 18 0 0 0 ...
## $ deaths08: int 0 0 0 3 0 3 15 0 0 0 ...
```

Now we make a vector of the sum of each year’s cases using `colSums()`

, and add names to each element using `names()`

.

```
Cases <- colSums(BFcases)
names(Cases) <- 2003:2008
Cases
```

```
## 2003 2004 2005 2006 2007 2008
## 4 46 98 115 88 34
```

The `pie()`

function requires a vector of non-negative integers. The names associated with each element are added as labels.

The default direction of `pie()`

is anti-clockwise. This can be reversed setting `clockwise = TRUE`

(the second pie chart).

For those who will sink so low, there is also a 3-D pie chart function in the ‘plotrix’ package…

```
par(mfrow = c(2, 2), mar = c(3, 3, 2, 1))
pie(Cases, main = "Ordinary pie chart")
pie(Cases, col = gray(seq(0.4, 1.0, length = 6)), clockwise = TRUE, main = "Grey colours")
pie(Cases, col = rainbow(6), clockwise = TRUE, main = "Rainbow colours")
#install.packages('plotrix')
library(plotrix)
pie3D(Cases, labels = names(Cases), explode = 0.1, main = "3D pie chart", labelcex = 0.6)
```

Most professional statistical graphicicians recommend against using pie charts for the following reasons:

- Values of all categories need to sum to a meaningful whole

- Categories need to be mutually exclusive

- Category ‘other’ may hide important information

- It is hard for the human eye to measure angles, and therefore the size of each slice

- Pie charts often require extra labels and legends to interpret (low data:ink ratio)

See here for a greater discussion of these issues.

An improvement on the pie chart is the stacked bar plot. This allows the viewer to more readily compare the sizes of each section because we also have an axis. But.. there are no labels (although we could add them with `text()`

or `legend()`

).

```
# To get the 'stacked' bars, we need to convert our vector to a matrix (barplot() will only stack a matrix).
barplot(as.matrix(Cases), horiz = TRUE, xlim = c(0, 400))
```

The `barplot()`

function is discussed in more detail below.

There are a variety of ways to plot continuous data as a function of categorical predictors.

**data:** continuous or count response; categorical or binned continuous predictor

**use:** comparing categories, showing distributions, …

**example:** BirdFluDeath

We use the bird flu data again, but this time select the deaths columns.

```
BFdata <- read.table(file = 'http://www.simonqueenborough.info/R/basic/data/birdflu_corrected.txt', header = TRUE, sep = '\t')
# select cases columns only
BFdeaths <- BFdata[, c('deaths03', 'deaths04', 'deaths05', 'deaths06', 'deaths07', 'deaths08')]
names(BFdeaths)
```

`## [1] "deaths03" "deaths04" "deaths05" "deaths06" "deaths07" "deaths08"`

`str(BFdeaths)`

```
## 'data.frame': 15 obs. of 6 variables:
## $ deaths03: int 0 0 0 1 0 0 0 0 0 0 ...
## $ deaths04: int 0 0 0 0 0 0 0 0 0 0 ...
## $ deaths05: int 0 0 4 5 0 0 13 0 0 0 ...
## $ deaths06: int 5 0 2 8 0 10 45 2 0 0 ...
## $ deaths07: int 0 0 1 3 0 9 37 0 2 0 ...
## $ deaths08: int 0 0 0 3 0 3 15 0 0 0 ...
```

```
# Make a vector as before
Deaths <- colSums(BFdeaths)
names(Deaths) <- 2003:2008
Deaths
```

```
## 2003 2004 2005 2006 2007 2008
## 4 32 43 79 59 26
```

How have the number of cases and deaths changed over time?

```
par(mfrow = c(2, 2), mar = c(3, 3, 2, 1), lwd = 2)
barplot(Cases, main = "default barplot of Cases")
# Create a new variable
Counts <- cbind(Cases, Deaths) # makes a matrix
barplot(Counts, main = "default with 2-column matrix" )
barplot(t(Counts), col = gray(c(0.5, 1)), main = "stacked")
barplot(t(Counts), beside = TRUE, main = "beside")
```