Vectors and Sequences

Vectors and Sequences

Introducing vectors and elements

In this lesson, you will learn how to create your first data objects in R, vectors. Vectors are 1 dimensional ‘structures’ that store data. For example, if we measured the height of 10 people, then we might have a vector of 10 heights in R, one entry (called an ‘element’) for each person. Vectors are the most common data object you will work with in R.

The easiest way to create a vector is with the c() function, which stands for ‘concatenate’ or ‘combine’. To create a vector containing the numbers 1.1, 9, and 3.14, type c(1.1, 9, 3.14). Each number must be seperated by a comma. Try it now and store the result in a variable called z.

z <- c(1.1, 9, 3.14)

Anytime you have questions about a particular function, you can access R’s built-in help files via the ? command. For example, if you want more information on the c() function, type ?c without the parentheses that normally follow a function name. Give it a try.

?c

Type z to view its contents. Notice that there are no commas separating the values in the output.

z
## [1] 1.10 9.00 3.14

You can combine vectors to make a new vector. Create a new vector that contains z, 555, then z again in that order. Don’t assign this vector to a new variable, so that we can just see the result immediately.

c(z, 555, z)
## [1]   1.10   9.00   3.14 555.00   1.10   9.00   3.14

Working with vectors

Numeric vectors can be used in arithmetic expressions. Type the following to see what happens: z * 2 + 100.

z * 2 + 100
## [1] 102.20 118.00 106.28

First, R multiplied each of the three elements in z by 2. Then it added 100 to each element to get the result you see above. This is another extremely useful feature of R and will allow you to work with very large data sets incredibly easily and powerfully.

Other common arithmetic operators are +, -, /, and ^ (where x^2 means ‘x squared’). To take the square root, use the sqrt() function and to take the absolute value, use the abs() function.

Take the square root of z - 1 and assign it to a new variable called my_sqrt.

my_sqrt <- sqrt(z - 1)

Before we view the contents of the my_sqrt variable, what do you think it contains? (Enter the number of the option you think is correct and press Enter).

  1. a vector of length 3
  2. a single number (i.e a vector of length 1)
  3. a vector of length 0 (i.e. an empty vector)

a vector of length 3

Print the contents of my_sqrt.

my_sqrt
## [1] 0.3162278 2.8284271 1.4628739

As you may have guessed, R first subtracted 1 from each element of z, then took the square root of each element. This leaves you with a vector of the same length as the original vector z.

Now, create a new variable called my_div that gets the value of z divided by my_sqrt.

my_div <- z / my_sqrt

Which statement do you think is true?

  1. The first element of my_div is equal to the first element of z divided by the first element of my_sqrt, and so on…
  2. my_div is a single number (i.e a vector of length 1)
  3. my_div is undefined

The first element of my_div is equal to the first element of z divided by the first element of my_sqrt, and so on…

Go ahead and print the contents of my_div.

my_div
## [1] 3.478505 3.181981 2.146460

When given two vectors of the same length, R simply performs the specified arithmetic operation (+, -, *, etc.) element-by-element. If the vectors are of different lengths, R ‘recycles’ the shorter vector until it is the same length as the longer vector.

When we did z * 2 + 100 in our earlier example, z was a vector of length 3, but technically 2 and 100 are each vectors of length 1.

Behind the scenes, R is ‘recycling’ the 2 to make a vector of 2s and the 100 to make a vector of 100s. In other words, when you ask R to compute z * 2 + 100, what it really computes is this: z * c(2, 2, 2) + c(100, 100, 100).

To see another example of how this vector ‘recycling’ works, try adding c(1, 2, 3, 4) and c(0, 10). Don’t worry about saving the result in a new variable.

c(1, 2, 3, 4) + c(0, 10)
## [1]  1 12  3 14

If the length of the shorter vector does not divide evenly into the length of the longer vector, R will still apply the ‘recycling’ method, but will throw a warning to let you know something fishy might be going on.

Try c(1, 2, 3, 4) + c(0, 10, 100) for an example.

c(1, 2, 3, 4) + c(0, 10, 100)
## Warning in c(1, 2, 3, 4) + c(0, 10, 100): longer object length is not a
## multiple of shorter object length
## [1]   1  12 103   4

This warning only appears if the longer of the two vectors is not a multiple of the shorter one. In the previous example, c(1, 2, 3, 4) is length = 4, twice the length of c(0, 10), which is length = 2. No warning.

However, when we add a vector of length 4 and a vector of length 3, the shorter one is not completely recycled; hence the warning.

Time-saving tricks for working with the command-line

Before continuing this lesson, I’d like to show you a couple of time-saving tricks.

Earlier in the lesson, you computed z * 2 + 100. Let’s pretend that you made a mistake and that you meant to add 1000 instead of 100. You could either re-type the expression, or…

In many programming environments, the up arrow will cycle through previous commands. Try hitting the up arrow on your keyboard until you get to this command z * 2 + 100, then change 100 to 1000 and hit Enter. If the up arrow doesn’t work for you, just type the corrected command.

z * 2 + 1000
## [1] 1002.20 1018.00 1006.28

Finally, let’s pretend you’d like to view the contents of a variable that you created earlier, but you can’t seem to remember if you named it my_div or myDiv. You could try both and see what works, or…

… You can type the first two letters of the variable name, then hit the Tab key (possibly more than once). Most programming environments will provide a list of variables that you’ve created that begin with ‘my’. This is called ‘auto-completion’ and can be quite handy when you have many variables in your workspace. In RStudio, it works particularly well, including suggesting the options to functions (which we will get to later). Give it a try. (If auto-completion doesn’t work for you, just type my_div and press Enter.)

my_div
## [1] 3.478505 3.181981 2.146460

You now know how to create and work with vectors in R! Awesome!

How to create sequences of numbers

We will now create sequences of numbers or characters in R. Sequences are used in many different tasks, from plotting the axes of graphs to generating simulated data. Note that we will be creating more vectors.

Make sequences using :

The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works.

1:20
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).

We could also use it to create a sequence of real numbers (a real number is a positive, negative, or 0 with an infinite or finite sequence of digits after the decimal place). For example, try typing pi:10.

pi:10
## [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

The result is a vector of real numbers starting with pi (3.142…) and increasing in increments of 1. The upper limit of 10 is never reached, since the next number in our sequence would be greater than 10.

Note also that pi is one of the few constants built in to R. Type ?pi to check the others.

?pi

What happens if we do 15:1? Give it a try to find out.

15:1
##  [1] 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

It counted backwards in increments of 1! This is sometimes useful for plotting coefficients from models in reverse order.

Remember that if you have questions about a particular R function, you can access its documentation with a question mark followed by the function name: ?function_name_here. However, in the case of an operator like the colon used above, you must enclose the symbol in backticks like this: ?:. (NOTE: The backtick (`) key is generally located in the top left corner of a keyboard, above the Tab key. If you don’t have a backtick key, you can use regular quotes.)

Pull up the documentation for : now.

?`:`

Make sequences using seq()

Often, we’ll desire more control over a sequence we’re creating than what the : operator gives us. The seq() (‘sequence’) function serves this purpose.

The most basic use of seq() does exactly the same thing as the : operator. Try seq(1, 20) to see this.

seq(1, 20)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

This gives us the same output as 1:20. Check the help file for seq().

?seq

The help files show the arguments listed for the seq() function. The first two arguments are “from =” and “to =”. In R, you do not have to specify the arguments by name if you write out their values in the same order as written in the function. However, for complex functions it is often best practice to do so and makes your code much clearer.

For example, seq(from = 1, to = 20) will give the same output as seq(1, 20). Try it!

seq(from = 1, to = 20)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

OK, let’s say that instead of 1 to 20, we want a vector of numbers ranging from 0 to 10, incremented by 0.5. seq(0, 10, by = 0.5) does just that. Try it out.

seq(0, 10, by = 0.5)
##  [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5
## [15]  7.0  7.5  8.0  8.5  9.0  9.5 10.0

Or maybe we don’t care what the increment is and we just want a sequence of 30 numbers between 5 and 10. seq(5, 10, length = 30) does the trick. Give it a shot now and store the result in a new variable called my_seq.

my_seq <- seq(5, 10, length = 30)

If you look closely again at the help file for ?seq, you will not see an argument “length =”, but only “length.out =”. You can actually use any abbreviation of the argument name, as long as it is different from any other argument. You could even use just “l =”!

To confirm that my_seq has length 30, we can use the length() function. Try it now. To do this, you need to include the object ‘my_seq’ as the value of argument ‘x’ of length().

length(my_seq)
## [1] 30

Let’s pretend we don’t know the length of my_seq, but we want to generate a sequence of integers from 1 to N, where N represents the length of the my_seq vector. In other words, we want a new vector (1, 2, 3, …) that is the same length as my_seq.

There are several ways we could do this. One possibility is to combine the : operator and the length() function like this: 1:length(my_seq). Give that a try.

1:length(my_seq)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30

Another option is to use seq(along.with = my_seq). Give that a try.

seq(along.with = my_seq)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30

However, as is the case with many common tasks, R has a separate built-in function for this purpose called seq_along(). Type seq_along(my_seq) to see it in action.

seq_along(my_seq)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30

There are often several approaches to solving the same problem, particularly in R. Simple approaches that involve less typing are generally best. It’s also important for your code to be readable, so that you and others can figure out what’s going on without too much hassle.

If R has a built-in function for a particular task, it’s likely that function is highly optimized for that purpose and is your best option. One of the philosophies of R (and Unix more generally) is to have tools (or functions) that do specific things very well and then link these together, rather than a single multi-purpose tool that does many things poorly.

This approach is like having a seperate knife, fork, and spoon, rather than a Spork … In most situations, cutlery (US: ‘silverware’) is superior to the Spork.

As you become a more advanced R programmer, you will learn how to link and nest these apparently simple functions to do incredibly powerful tasks. You will also design your own functions to perform tasks when there are no better options. We’ll explore writing your own functions in future lessons.

Make sequences using rep()

OK, back to the show. One more function related to creating sequences of numbers is rep(), which stands for ‘replicate’. Let’s look at a few uses.

If we’re interested in creating a vector that contains 40 zeros, we can use rep(0, times = 40). Try it out.

rep(0, times = 40)
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0

If instead we want our vector to contain 10 repetitions of the vector (0, 1, 2), we can do rep(c(0, 1, 2), times = 10). Go ahead.

rep(c(0, 1, 2), times = 10)
##  [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

Finally, let’s say that rather than repeating the vector (0, 1, 2) over and over again, we want our vector to contain 10 zeros, then 10 ones, then 10 twos. We can do this with the each argument. Try rep(c(0, 1, 2), each = 10), and store it as a new variable called my_rep.

my_rep <- rep(c(0, 1, 2), each = 10)

First, lets make sure that our object is in fact a vector. Occasionally in R we’ll want to know whether an object is what we think it is - there are many functions that let us do this, and they all follow the is.object_type format. Go ahead and enter is.vector(my_rep) now.

is.vector(my_rep)
## [1] TRUE

my_rep is a vector!

Lets see what this vector looks like. We’ll use the plot() function, something you will become very familiar with in time. It can take many types and forms of data, as well as a whole host of arguments to customize the output, but for now, just go ahead and plot(my_rep).

plot(my_rep)

You can see that the y-axis is labeled as my_rep, with values spanning the numeric range of your vector. Because we’re only plotting one source of data, the x-axis is just an index telling you where along your vector those values occur. We will learn later how to plot two sources of data to examine their relationship, making the resulting plots much nicer in the process.

The function rep() also works with other data types like characters or factors (characters or numbers that represent categories of data, like brown or blue eyes). Using rep is often useful in this way for giving factors to your data. Try repeating the vector c('a','b','c') 5 times.

rep(c("a","b","c"), 5)
##  [1] "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c" "a" "b" "c"

Congratulations! Now you have several powerful tools that you can use to generate sequences of of values. You also learnt to use the function length(), the : operator, and made your first plot. Your R skills are building!

Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.

  1. Go ahead, make my day!

Go ahead, make my day!