In this lesson you will learn the how and why of loops.

What Are Loops?

Looping is taking a multi-step process and automating it. It allows you to ‘batch’ sets of repeated parts of code and cycle through them. Loops are integral to all modern programming languages.

There are two main types of loops: (1) The ‘for’ loop type which exeute for a prescribed number of times and are controlled by a counter or an index. (2) The ‘while’ or ‘repeat’ loop type are based on verifying a logical condition which is tested at the start and/or end of each loop.

For loops

Suppose you wanted to print out six sentences of the form ‘The year is [i]’, where [i] is one of 2010, 2011, 2012, 2013, 2014, and 2015.

We could do this with six lines of code, using the print() and paste() commands, as follows: print(paste('The year is', 2010)). Try that out.

print(paste('The year is', 2010))
## [1] "The year is 2010"

What is this code doing? First, the function print() prints whatever is inside it to the console. If we ran print('Hullo, Simon'), we would get ‘Hullo, Simon’ displayed in the console.

Second, the function paste() is a very useful function for sticking bits of text together with other things (which are converted to text. In this case, paste() takes the text string ‘The year is’, concatenates it with the integer 2010, and gives a character/text string ‘The year is 2010’.

Great! Now you only have to do that five more times … Repeat that code for the year 2011.

print(paste('The year is', 2011))
## [1] "The year is 2011"

Great! Now you only have to do that four more times … Repeat that code for the year 2012.

print(paste('The year is', 2012))
## [1] "The year is 2012"

Great! Now you only have … just joking. But, you can see that it can get pretty tedious writing out the same thing/s over and over again, and just making one of two small changes.

This is where the idea of the loop comes in. It allows us to automate this process, which makes our lives a little less tedious, and also removes another step where human error can creep in.

One of the principles of programming is D.R.Y.: Don’t Repeat Yourself. If we can avoid repetition, all is well. So, how do we construct a loop?

There are at minimum three steps to creating a loop: an object to loop through, a counter, and the code that is executed at each step or loop. First, we need a vector which we can cycle over or move through at each step. This vector is the part of the code that is changing every time.

In this example, we are changing the year at every step. So, we need a vector of years, that goes from 2010 to 2015. Create that now, but do not assign it to anything.

2010:2015
## [1] 2010 2011 2012 2013 2014 2015

Great, so you have a vectors of 6 years. How do we put that into our statement ‘The year is’? We need to use the function for(). This function contains the counter that loops through the vector of years.

Let’s try that. To ensure the code runs, we also need to add a pair of curly brackets. Type: for(i in 2010:2015){}

for(i in 2010:2015){}

What did we do there? Well, we have the vector we created earlier, of years. We also added the extra object i, which acts as an index. We use this to link the counter, vector, and code we will run inside the loop. Because we are trying to automate the printing of years, we need some term to stand in for each year. We therefore use the object ‘i’. The ‘in’ part of the code is the bit that steps through each element of the vector of years.

The function for() also allows you to add code after it. This code will be the code that is run for each step of the loop. To ensure that R knows that this code is contained in the loop, it should be nested within curly brackets: {}.

Modify the code we used previously to print one year to the screen, but instead of using an integer, put the letter ‘i’ (without quotes). This will link with the previous use of ‘i’ inside the function for(). (And for now, do not put this code inside the curly brackets)

print(paste('The year is', i))
## [1] "The year is 2015"

Great! So, now we can build the whole loop. Put all those pieces together, and see what happens!

for(i in 2010:2015) {print(paste('The year is', i))}
## [1] "The year is 2010"
## [1] "The year is 2011"
## [1] "The year is 2012"
## [1] "The year is 2013"
## [1] "The year is 2014"
## [1] "The year is 2015"

Let’s explain that again in real words. For each ‘year’ that is in the sequence 2010 to 2015, run the code chunk print(paste('The year is', i)). Once the for loop has executed the code chunk for every year (i) in the vector, the loop stops and goes to the first instruction after the loop block.

Ok, so now try this on your own. Create a simple loop that prints the integers 1 to 10 to the screen. First, make a vector of integers 1 through 10.

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

Now, put that in a call to for(), cycle through it, and print each element. In other words, make the loop.

for(i in 1:10){print(i)}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Now that you can write simple loops, we can start expanding on what they can do.

The first thing to note is that, when you write your own loops, the curly brackets can enclose many lines of code.

Second, loops can be nested in loops can be nested in loops can be .. Much like lists, you can nest loops within loops.

Third, a very useful aspect of loops is that you can use them to fill results into an empty vector or matrix.

Let’s try out your new skills on a real dataset. The Michigan tree plot data are loaded with this lesson, called trees. Look at the head of the dataset to remind yourself.

head(trees)
##       x     y  species  dbh status
## 1 0.078 0.091 blackoak 0.85      1
## 2 0.076 0.266 blackoak 0.90      1
## 3 0.051 0.225 blackoak 1.11      1
## 4 0.015 0.366 blackoak 0.18      0
## 5 0.030 0.426 blackoak 0.32      1
## 6 0.102 0.474 blackoak 0.11      1

Your task is to calculate the mean DBH for each species. Remember that we could do this easily with the function tapply(). We are essentially going to try and replicate that function here, with a loop.

First, we need to set our vector to loop over. This will be the names of the six different species in the data. Use levels() to obtain these, and assign this vector to the object spp.

spp <- levels(trees$species)

Check that you got all the species. Type: spp.

spp
## [1] "blackoak" "hickory"  "maple"    "misc"     "redoak"   "whiteoak"

Now we need to work out the code we will run to calculate the mean dbh. We will build this up in steps. First, write code to calculate the mean dbh of White Oak, subsetting the trees data within the call to mean().

mean(trees$dbh[trees$species == 'whiteoak'])
## [1] 3.837098

Now, we want to replace with actual name of whiteoak with its’ index. First, use the correct index number on the spp vector to return ‘whiteoak’.

spp[6]
## [1] "whiteoak"

Now replace the hard-coded species name with that call to the index in spp in the previous call to mean().

mean(trees$dbh[trees$species == spp[6]])
## [1] 3.837098

Ok, so now we can reference the spp vector to pull out any particular species by index and calculate its’ mean dbh. The next step is to store this value somewhere. Make an empty vector called x.

x <- vector()

Now, assign the mean dbh of white oak to the first element of this new vector, using the previous code.

x[1] <- mean(trees$dbh[trees$species == spp[6]])

Hopefully you can now see how we can pull all these parts together to create the loop that will give us a vector (x) of the mean dbh values for all six tree species. For this code to run across all species, we can index both sides by the counter index (i) rather than the specific species index.

To acheive this, you will also need to know how to count along the spp vector. The best way is to use the function length(). How might you use this function to count from 1 to the number of species?

1:length(spp)
## [1] 1 2 3 4 5 6

Now we have our vector to loop over (spp), the counter (1:length(spp)) and the code (x[i] <- mean(trees$dbh[trees$species == spp[i]])). Put these parts together to loop over all size tree species and return the vector x that contains their mean dbh values.

for(i in 1:length(spp)){x[i] <- mean(trees$dbh[trees$species == spp[i]])}

Finally, check the result of all your work, and look at x.

x
## [1] 1.258519 2.124154 3.025350 3.394286 9.991156 3.837098

Great work! You can now write loops that access real data, and assign the output to positions in empty objects such as vectors.

A quick note on using loops to fill empty objects (like vectors). One of the great things about using loops in this way is that we don’t need to know the final dimensions of our object - we can keep adding to it so long as our looping conditions are met.

However, it is always much faster computationally to predefine the size of our object if we know the final size needed, i.e. create the initial vector with all elements in place rather than starting with an empty vector. Otherwise, R has to recreate the object in every iteration of the loop as it becomes larger, which can take a lot of time if you need to loop 1,000, 10,000 or even 1,000,000+ times.

Just something to keep in mind if you find your computer hanging on loops.

You probably also appreciate the fact that tapply() is a much more useful and easier to use function than maybe you thought!

In summary, if you need to perform an action (say) three times or more, then a loop may serve you better for several reasons: It makes the code more (i) compact, (ii) readable, and (iii) maintainable and you may save some typing.

However, once you spend enough time with and reading about R, you will frequently hear advice to avoid the use of loops.

This is because R can do something that few programming languages can do: vectorization.

Vectorization is the conversion of repeated operations on simple numbers (“scalars”) into single operations on vectors or matrices.

We have encountered this many times before, but not made note of it. For example, if you add to vectors of equal length (e.g., columns in a data frame) you are vectorizing because R sums the two elements in each row. Similarly, the function colSums() adds up all the elements in the first column before ‘looping’ to the next.

Many instances where a loop is required in another programming language, R will vectorize instead, or the problem can be vectorized.

This benefit is partly because R passes all these calculations to a lower-level language (C), which is much faster than the higher-level R; partly because R treats even single integers as single-element vectors.

Check this website for more details on vectorization: http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html

Either way, you will almost certainly encounter situations where you will need a loop.

Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.

  1. Sure thing!

Sure thing!