Plotting_More_Data

Welcome back :) OK! We finished the last lesson using the arguments cex = to change the size of various parts of our plots of bird data.

Let’s continue where we left off, and recreate that plot. We will begin this lesson by adding some colour to the plot, and then look at the common arguments to par(), the function that sets all the graphical parameters.

By the end of this lesson, you should be able to change the appearance of your figures in many and various ways.

We will continue to use the same bird dataset as before, ‘BirdData’. Look at the data again, type BirdData.

BirdData <- data.frame(
  Tarsus  = c(22.3, 19.7, 20.8, 20.3, 20.8, 21.5, 20.6, 21.5),
  Head    = c(31.2, 30.4, 30.6, 30.3, 30.3, 30.8, 32.5, 31.6),
  Weight  = c(9.5, 13.8, 14.8, 15.2, 15.5, 15.6, 15.6, 15.7),
  Wingcrd = c(59, 55, 53.5, 55, 52.5, 57.5, 53, 55),
  Species = c('A', 'A', 'A', 'A', 'A',  'B', 'B', 'B')
)
BirdData
##   Tarsus Head Weight Wingcrd Species
## 1   22.3 31.2    9.5    59.0       A
## 2   19.7 30.4   13.8    55.0       A
## 3   20.8 30.6   14.8    53.5       A
## 4   20.3 30.3   15.2    55.0       A
## 5   20.8 30.3   15.5    52.5       A
## 6   21.5 30.8   15.6    57.5       B
## 7   20.6 32.5   15.6    53.0       B
## 8   21.5 31.6   15.7    55.0       B

Ok, remember from the last lesson how to plot Head size as a function of Tarsus? Just plot the data, don’t worry about labels, but do use the formula notation and data = argument.

plot(Head ~ Tarsus, data = BirdData)

We will look very briefly at the use of colours in figures now, before going into colour in more detail in another lesson. We will also use colour to illustrate the usefulness of using vectors as inputs.

Anyway, first things first. Colour is very useful for various things in graphics, mostly in terms of illustrating different groups of data. In the past, most journals were only printed in black and white, so different shades of grey were important.

R has 102 unique shades of grey. See this blogplost.

R also has access to a very wide range of colours (including grey). Colours can be accessed in a variety of ways:

  1. by number, e.g., col = 1 for black, or col = 2 for red;

  2. by name, e.g., col = 'black';

  3. by hexadecimal code, e.g., col = '#FF0000';

  4. or RGB value, e.g., col = 'rgb(0,0,0).

Try changing the colour of the data points in the figure to red, using the name. The colours will be more obvious if you also use a different plotting character, so use pch = 20.

plot(Head ~ Tarsus, data = BirdData, pch = 20, col = 'red')

When used in a scatterplot, the argument col = changes the colours of the plotting symbols. What does this argument do when applied to a boxplot? Add the color red to a plot of Head on Species.

plot(Head ~ Species, data = BirdData, col = 'red')

Adding the argument col = 'red' to a boxplot fills in the boxes.

You can use col.lab =, col.main =, and col.axis = to change the colour of these external parts of the plot, if needed.

We will consider colour in more detail in a later lesson.

All of the parameters we have dealt with so far (colour, symbols, size, etc.) can be changed using vectors, rather than addressing each individual element.

For example, instead of specifying that all the points should be red, we can use a vector of colours to make each point a different colour. How might we do this? Well, We have eight data points. We know that we can specify colours with a number. How about setting col = 1:8 in our plot of Head on Tarsus? Try that. (Remember to keep pch = 20; and also set cex = 3 to make each point larger.

plot(Head ~ Tarsus, data = BirdData, col = 1:8, pch = 20, cex = 3)

Ok super. We can use a vector to set colours… how might this actually be useful?

We could use colour to indicate different groups. We know the species of each sparrow in our data. This Species column is a factor variable, not numeric, but try setting col = Species, modifying the previous code.

plot(Head ~ Tarsus, data = BirdData, col = Species, pch = 20, cex = 3)

Woo hoo! Colours correspond to species. R converted the factors to integers according to their level (first level = ‘A’ = black; second level = ‘B’ = red) and coloured their points. Cool beans!

In the previous class, to modify elements of the graphic we added various arguments to the function plot().

However, if you look at the help file (?plot), there are only three arguments specified, x, y, and ...: plot(x, y, ...).

The elipses argument ... is special and allows arguments for one function to be passed through another function. Remember that we discussed this issue briefly when we wrote our own functions, but for now, we can see that arguments such as cex, pch, etc. are not used directly by plot(). These arguments are actually arguments for the graphical parameters function par(), that is used to modify the appearance and layout of graphics.

There are a lot of parameters that you can control. Some you will use frequently, others will not be touched.

All the arguments are listed on the par() help page (‘?par’). You should spend some time looking over this page to see what options are available. Open it up now, so you can refer to it as we progress through the lesson.

?par

As we said above, all the arguments discussed below can be placed within a call to par(), e.g., par(pch = 13, cex = 2).

One important thing to note is that any changes to the defaults of par() will result in these changes being applied to all the subsequent plots that you create.

There are several ways to reset the graphics window/s to the default parameters. Within RStudio, the easiest way is to go to the >Plots menu above, and select ’Clear All. This will close all the open plots windows and reset the graphics parameters to their defaults.

In terms of working with and creating graphics, a useful way of working is to run code iteratively within the RStudio graphics windows to get the code to generate you figure correct. Then, use a separate graphics device to create the final plot (more on this specific operation later!).

Alternatively, as we did in the previous lesson, these arguments can also be placed within the call to plot(), and will be passed to par() and have the same effect. In this lesson it will be easier to put these arguments within par().

OK, let’s get back to work … First, we will look at some common modifications to the axes of plots. In many cases the default range of the axes, tick mark position and orientation, and numbering scheme are not exactly what you (or your prefered publication) want.

We will use our example plot of Head on Tarsus again to illustrate these changes. First, remind yourself of the basic code to make that plot using the formula approach and data argument …

plot(Head ~ Tarsus, data = BirdData)

Great. Look carefully at the axes. The tick marks (tcl = 1) point outwards from the plot, and their length is expressed as a fraction of the height of a line of text. The default is tcl = -0.5.

You can flip the direction (i.e., have the tick marks point into the plot) by making that value positive and also change the length of the tick marks. Try setting the tick marks to the full height of a line of text and pointing inwards. Remember there is no need to call par() first, just put the argument inside the main plot() call.

plot(Head ~ Tarsus, data = BirdData, tcl = 1)

Another common modification is to change the orientation of the numbers along the axes. The default is that the numbers are parallel to their respective axis. You can see this in the current plot that is displayed… along the x-axis the numbers are horizontal and along the y-axis the numbers are rotated 90 degrees anti-clockwise. You can change the style of these axis labels with the argument las =. The default is parallel to the axis (las = 0), but you may often want to set them to be always horizontal (las = 1) or always perpendicular (las = 2), or even always vertical (las = 3). Try setting the axes to always horizontal, in a new call to our basic plot command.

plot(Head ~ Tarsus, data = BirdData, las = 1)

Ok, now let’s combine these two arguments and see what happens. Put the previous new values of las and tcl in our plot command.

plot(Head ~ Tarsus, data = BirdData, las = 1, tcl = 1)

What do you see? We now have tick marks pointing inwards and horizontal tick labels. Doing these operations has led to some extra white space appearing around the edges of our plot, where the tick marks used to be. For someone as persnickity about graphics as me, this does not look quite right … ‘How might we decrease this space?’ I hear you ask … Well …

As you might expect, there is an argument for that. Within par, the argument mgp = is a vector of three elements that controls the number of lines out from the plot edge where the axis title (mgp[1]), axis labels (mgp[2]) and axis line itself (mgp[3]) are placed. The default is mgp = c(3, 1, 0). In our scenario, a more appealing distance between the axis labels and the axis itself is mgp = c(2.5, 0.5, 0). Try that, adding the mgp argument to the previous code.

plot(Head ~ Tarsus, data = BirdData, las = 1, tcl = 1, mgp = c(2.5, 0.5, 0))

Great, that moved the axes labels closer in, which (I think!) looks better. See what fine-scale control you can have over the appearance of your figures? It’s wonderful!

R generally sets the limits of the axes just wider than the data. There are several cases when you may want to change this, for example to set the lower limit to 0, or make both sides wider if you are plotting confidence limits and the default range is not wide enough. Setting the arguments xlim = and ylim = (x- and y axis limits) will resolve this issue. Try setting the x axis limits to 19 and 25. As with mgp =, you need to pass a vector here, in this case of only two elements setting the min and max of the axis. Add this argument to the previous code.

plot(Head ~ Tarsus, data = BirdData, las = 1, tcl = 1, mgp = c(2.5, 0.5, 0), xlim = c(19, 25))

As an aside, xlim and ylim are not arguments to par; it would not make sense to have them apply to every subsequent plot. These two arguments are sent by the plot() function to the plot.window() function, which sets up the coordinate system for that graphics window.

R tries very hard to make the axes limits as pretty as possible. These are recomputed internally, so if you resize your graphics window, they may even change in number and position. In particular, if you make the window too narrow, R will drop some of the labels. Try playing with the size of the graphics window now.

What if you wanted to ignore R, and decide on exactly everything about the axis labels, including the position of the tick marks and what labels went there? You can do it! However, first we would need remove the default axis and/or not have an axis plotted at all. We can suppress plotting the x axis with the argument xaxt = 'n', and the y axis withyaxt = ‘n’. Try suppressing the x axis of our default basic plot (i.e., ignoring all the recent additions).

plot(Head ~ Tarsus, data = BirdData, xaxt = 'n')

Great.. this suppresses the axis tick marks and labels, but not the axis title. To suppress the axis title, we have to modify the xlab = argument in plot() You can do this by xlab = ’’ (i.e., a text string with nothing in it). Try that, too, adding to the previous code.

plot(Head ~ Tarsus, data = BirdData, xaxt = 'n', xlab = '')