lapply and sapply

In this lesson, you’ll learn how to use lapply() and sapply(), the two most important members of R’s *apply family of functions, also known as loop functions.

These powerful functions, along with their close relatives (vapply() and tapply(), among others) offer a concise and convenient means of implementing the Split-Apply-Combine strategy for data analysis.

Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a function to each piece, then COMBINE the results. A more detailed discussion of this strategy is found in Hadley Wickham’s Journal of Statistical Software paper titled ‘The Split-Apply-Combine Strategy for Data Analysis’.

Throughout this lesson, we’ll use the Flags dataset from the UCI Machine Learning Repository. This dataset contains details of various nations and their flags. More information may be found here: http://archive.ics.uci.edu/ml/datasets/Flags

Let’s jump right in so you can get a feel for how these special functions work!

I’ve stored the dataset in a variable called flags. Type head(flags) to preview the first six lines (i.e. the ‘head’) of the dataset.

head(flags)
##             name landmass zone area population language religion bars
## 1    Afghanistan        5    1  648         16       10        2    0
## 2        Albania        3    1   29          3        6        6    0
## 3        Algeria        4    1 2388         20        8        2    2
## 4 American-Samoa        6    3    0          0        1        1    0
## 5        Andorra        3    1    0          0        6        0    3
## 6         Angola        4    2 1247          7       10        5    0
##   stripes colours red green blue gold white black orange mainhue circles
## 1       3       5   1     1    0    1     1     1      0   green       0
## 2       0       3   1     0    0    1     0     1      0     red       0
## 3       0       3   1     1    0    0     1     0      0   green       0
## 4       0       5   1     0    1    1     1     0      1    blue       0
## 5       0       3   1     0    1    1     0     0      0    gold       0
## 6       2       3   1     0    0    1     0     1      0     red       0
##   crosses saltires quarters sunstars crescent triangle icon animate text
## 1       0        0        0        1        0        0    1       0    0
## 2       0        0        0        1        0        0    0       1    0
## 3       0        0        0        1        1        0    0       0    0
## 4       0        0        0        0        0        1    1       1    0
## 5       0        0        0        0        0        0    0       0    0
## 6       0        0        0        1        0        0    1       0    0
##   topleft botright
## 1   black    green
## 2     red      red
## 3   green    white
## 4    blue      red
## 5    blue      red
## 6     red    black

You may need to scroll up to see all of the output. Now, let’s check out the dimensions of the dataset using dim(flags).

dim(flags)
## [1] 194  30

This tells us that there are 194 rows, or observations, and 30 columns, or variables. Each observation is a country and each variable describes some characteristic of that country or its flag. To open a more complete description of the dataset in a separate text file, type viewinfo() when you are back at the prompt (>).

As with any dataset, we’d like to know in what format the variables have been stored. In other words, what is the ‘class’ of each variable? What happens if we do class(flags)? Try it out.

class(flags)
## [1] "data.frame"

That just tells us that the entire dataset is stored as a ‘data.frame’, which doesn’t answer our question. What we really need is to call the class() function on each individual column. While we could do this manually (i.e. one column at a time) it’s much faster if we can automate the process. Sounds like a loop!

The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the same length as the original one. Since a data frame is really just a list of vectors (you can see this with as.list(flags)), we can use lapply() to apply the class() function to each column of the flags dataset. Let’s see it in action!

Type cls_list <- lapply(flags, class) to apply the class() function to each column of the flags dataset and store the result in a variable called cls_list. Note that you just supply the name of the function you want to apply (i.e. class), without the usual parentheses after it.

cls_list <- lapply(flags, class)

Type cls_list to view the result.

cls_list
## $name ## [1] "factor" ## ##$landmass
## [1] "integer"
##
## $zone ## [1] "integer" ## ##$area
## [1] "integer"
##
## $population ## [1] "integer" ## ##$language
## [1] "integer"
##
## $religion ## [1] "integer" ## ##$bars
## [1] "integer"
##
## $stripes ## [1] "integer" ## ##$colours
## [1] "integer"
##
## $red ## [1] "integer" ## ##$green
## [1] "integer"
##
## $blue ## [1] "integer" ## ##$gold
## [1] "integer"
##
## $white ## [1] "integer" ## ##$black
## [1] "integer"
##
## $orange ## [1] "integer" ## ##$mainhue
## [1] "factor"
##
## $circles ## [1] "integer" ## ##$crosses
## [1] "integer"
##
## $saltires ## [1] "integer" ## ##$quarters
## [1] "integer"
##
## $sunstars ## [1] "integer" ## ##$crescent
## [1] "integer"
##
## $triangle ## [1] "integer" ## ##$icon
## [1] "integer"
##
## $animate ## [1] "integer" ## ##$text
## [1] "integer"
##
## $topleft ## [1] "factor" ## ##$botright
## [1] "factor"

The ‘l’ in ‘lapply’ stands for ‘list’. Type class(cls_list) to confirm that lapply() returned a list.

class(cls_list)
## [1] "list"

As expected, we got a list of length 30 – one element for each variable/column. The output would be considerably more compact if we could represent it as a vector instead of a list.

You may remember from a previous lesson that lists are most helpful for storing multiple classes of data. In this case, since every element of the list returned by lapply() is a character vector of length one (i.e. “integer” and “vector”), cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).

as.character(cls_list)
##  [1] "factor"  "integer" "integer" "integer" "integer" "integer" "integer"
##  [8] "integer" "integer" "integer" "integer" "integer" "integer" "integer"
## [15] "integer" "integer" "integer" "factor"  "integer" "integer" "integer"
## [22] "integer" "integer" "integer" "integer" "integer" "integer" "integer"
## [29] "factor"  "factor"

sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify (hence the ‘s’ in ‘sapply’) the result for you. Use sapply() the same way you used lapply() to get the class of each column of the flags dataset and store the result in cls_vect. If you need help, type ?sapply to bring up the documentation.

cls_vect <- sapply(flags, class)

Use class(cls_vect) to confirm that sapply() simplified the result to a character vector.

class(cls_vect)
## [1] "character"

In general, if the result is a list where every element is of length one, then sapply() returns a vector. If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix. If sapply() can’t figure things out, then it just returns a list, no different from what lapply() would give you.

Let’s practice using lapply() and sapply() some more!

Columns 11 through 17 of our dataset are indicator variables, each representing a different color. The value of the indicator variable is 1 if the color is present in a country’s flag and 0 otherwise.

Therefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the ‘orange’ column. Try sum(flags$orange) to see this. sum(flags$orange)
## [1] 26

Now we want to repeat this operation for each of the colors recorded in the dataset.

First, use flag_colors <- flags[, 11:17] to extract the columns containing the color data and store them in a new data frame called flag_colors. (Note the comma before 11:17. This subsetting command tells R that we want all rows, but only columns 11 through 17.)

flag_colors <- flags[, 11:17]

Use the head() function to look at the first 6 lines of flag_colors.

head(flag_colors)
##   red green blue gold white black orange
## 1   1     1    0    1     1     1      0
## 2   1     0    0    1     0     1      0
## 3   1     1    0    0     1     0      0
## 4   1     0    1    1     1     0      1
## 5   1     0    1    1     0     0      0
## 6   1     0    0    1     0     1      0

To get a list containing the sum of each column of flag_colors, call the lapply() function with two arguments. The first argument is the object over which we are looping (i.e. flag_colors) and the second argument is the name of the function we wish to apply to each column (i.e. sum). Remember that the second argument is just the name of the function with no parentheses, etc.

lapply(flag_colors, sum)
## $red ## [1] 153 ## ##$green
## [1] 91
##
## $blue ## [1] 99 ## ##$gold
## [1] 91
##
## $white ## [1] 146 ## ##$black
## [1] 52
##
## $orange ## [1] 26 This tells us that of the 194 flags in our dataset, 153 contain the color red, 91 contain green, 99 contain blue, and so on. The result is a list, since lapply() always returns a list. Each element of this list is of length one, so the result can be simplified to a vector by calling sapply() instead of lapply(). Try it now. sapply(flag_colors, sum) ## red green blue gold white black orange ## 153 91 99 91 146 52 26 Perhaps it’s more informative to find the proportion of flags (out of 194) containing each color. Since each column is just a bunch of 1s and 0s, the arithmetic mean of each column will give us the proportion of 1s. (If it’s not clear why, think of a simpler situation where you have three 1s and two 0s – (1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6). Use sapply() to apply the mean() function to each column of flag_colors. Remember that the second argument to sapply() should just specify the name of the function (i.e. mean) that you want to apply. sapply(flag_colors, mean) ## red green blue gold white black orange ## 0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412 0.1340206 In the examples we’ve looked at so far, sapply() has been able to simplify the result to vector. That’s because each element of the list returned by lapply() was a vector of length one. Recall that sapply() instead returns a matrix when each element of the list returned by lapply() is a vector of the same length (> 1). To illustrate this, let’s extract columns 19 through 23 from the flags dataset and store the result in a new data frame called flag_shapes. flag_shapes <- flags[, 19:23] will do it. flag_shapes <- flags[, 19:23] Each of these columns (i.e. variables) represents the number of times a particular shape or design appears on a country’s flag. We are interested in the minimum and maximum number of times each shape or design appears. The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes. Don’t worry about storing the result in a new variable. By now, we know that lapply() always returns a list. lapply(flag_shapes, range) ##$circles
## [1] 0 4
##
## $crosses ## [1] 0 2 ## ##$saltires
## [1] 0 1
##
## $quarters ## [1] 0 4 ## ##$sunstars
## [1]  0 50

Do the same operation, but using sapply() and store the result in a variable called shape_mat.

shape_mat <- sapply(flag_shapes, range)

View the contents of shape_mat.

shape_mat
##      circles crosses saltires quarters sunstars
## [1,]       0       0        0        0        0
## [2,]       4       2        1        4       50

Each column of shape_mat gives the minimum (row 1) and maximum (row 2) number of times its respective shape appears in different flags.

Use the class() function to confirm that shape_mat is a matrix.

class(shape_mat)
## [1] "matrix"

As we’ve seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we’ve looked at so far. Let’s look at an example where sapply() can’t figure out how to simplify the result and thus returns a list, no different from lapply().

When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the ‘unique’ elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).

unique(c(3, 4, 5, 5, 5, 6, 6))
## [1] 3 4 5 6

We want to know the unique values for each variable in the flags dataset. To accomplish this, use lapply() to apply the unique() function to each column in the flags dataset, storing the result in a variable called unique_vals.

unique_vals <- lapply(flags, unique)

Print the value of unique_vals to the console.

unique_vals
## $name ## [1] Afghanistan Albania ## [3] Algeria American-Samoa ## [5] Andorra Angola ## [7] Anguilla Antigua-Barbuda ## [9] Argentina Argentine ## [11] Australia Austria ## [13] Bahamas Bahrain ## [15] Bangladesh Barbados ## [17] Belgium Belize ## [19] Benin Bermuda ## [21] Bhutan Bolivia ## [23] Botswana Brazil ## [25] British-Virgin-Isles Brunei ## [27] Bulgaria Burkina ## [29] Burma Burundi ## [31] Cameroon Canada ## [33] Cape-Verde-Islands Cayman-Islands ## [35] Central-African-Republic Chad ## [37] Chile China ## [39] Colombia Comorro-Islands ## [41] Congo Cook-Islands ## [43] Costa-Rica Cuba ## [45] Cyprus Czechoslovakia ## [47] Denmark Djibouti ## [49] Dominica Dominican-Republic ## [51] Ecuador Egypt ## [53] El-Salvador Equatorial-Guinea ## [55] Ethiopia Faeroes ## [57] Falklands-Malvinas Fiji ## [59] Finland France ## [61] French-Guiana French-Polynesia ## [63] Gabon Gambia ## [65] Germany-DDR Germany-FRG ## [67] Ghana Gibraltar ## [69] Greece Greenland ## [71] Grenada Guam ## [73] Guatemala Guinea ## [75] Guinea-Bissau Guyana ## [77] Haiti Honduras ## [79] Hong-Kong Hungary ## [81] Iceland India ## [83] Indonesia Iran ## [85] Iraq Ireland ## [87] Israel Italy ## [89] Ivory-Coast Jamaica ## [91] Japan Jordan ## [93] Kampuchea Kenya ## [95] Kiribati Kuwait ## [97] Laos Lebanon ## [99] Lesotho Liberia ## [101] Libya Liechtenstein ## [103] Luxembourg Malagasy ## [105] Malawi Malaysia ## [107] Maldive-Islands Mali ## [109] Malta Marianas ## [111] Mauritania Mauritius ## [113] Mexico Micronesia ## [115] Monaco Mongolia ## [117] Montserrat Morocco ## [119] Mozambique Nauru ## [121] Nepal Netherlands ## [123] Netherlands-Antilles New-Zealand ## [125] Nicaragua Niger ## [127] Nigeria Niue ## [129] North-Korea North-Yemen ## [131] Norway Oman ## [133] Pakistan Panama ## [135] Papua-New-Guinea Parguay ## [137] Peru Philippines ## [139] Poland Portugal ## [141] Puerto-Rico Qatar ## [143] Romania Rwanda ## [145] San-Marino Sao-Tome ## [147] Saudi-Arabia Senegal ## [149] Seychelles Sierra-Leone ## [151] Singapore Soloman-Islands ## [153] Somalia South-Africa ## [155] South-Korea South-Yemen ## [157] Spain Sri-Lanka ## [159] St-Helena St-Kitts-Nevis ## [161] St-Lucia St-Vincent ## [163] Sudan Surinam ## [165] Swaziland Sweden ## [167] Switzerland Syria ## [169] Taiwan Tanzania ## [171] Thailand Togo ## [173] Tonga Trinidad-Tobago ## [175] Tunisia Turkey ## [177] Turks-Cocos-Islands Tuvalu ## [179] UAE Uganda ## [181] UK Uruguay ## [183] US-Virgin-Isles USA ## [185] USSR Vanuatu ## [187] Vatican-City Venezuela ## [189] Vietnam Western-Samoa ## [191] Yugoslavia Zaire ## [193] Zambia Zimbabwe ## 194 Levels: Afghanistan Albania Algeria American-Samoa Andorra ... Zimbabwe ## ##$landmass
## [1] 5 3 4 6 1 2
##
## $zone ## [1] 1 3 2 4 ## ##$area
##   [1]   648    29  2388     0  1247  2777  7690    84    19     1   143
##  [12]    31    23   113    47  1099   600  8512     6   111   274   678
##  [23]    28   474  9976     4   623  1284   757  9561  1139     2   342
##  [34]    51   115     9   128    43    22    49   284  1001    21  1222
##  [45]    12    18   337   547    91   268    10   108   249   239   132
##  [56]  2176   109   246    36   215   112    93   103  3268  1904  1648
##  [67]   435    70   301   323    11   372    98   181   583   236    30
##  [78]  1760     3   587   118   333  1240  1031  1973  1566   447   783
##  [89]   140    41  1267   925   121   195   324   212   804    76   463
## [100]   407  1285   300   313    92   237    26  2150   196    72   637
## [111]  1221    99   288   505    66  2506    63    17   450   185   945
## [122]   514    57     5   164   781   245   178  9363 22402    15   912
## [133]   256   905   753   391
##
## $population ## [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 ## [15] 35 4 24 2 11 1008 5 47 31 54 17 61 14 684 ## [29] 157 39 57 118 13 77 12 56 18 84 48 36 22 29 ## [43] 38 49 45 231 274 60 ## ##$language
##  [1] 10  6  8  1  2  4  3  5  7  9
##
## $religion ## [1] 2 6 1 0 5 3 4 7 ## ##$bars
## [1] 0 2 3 1 5
##
## $stripes ## [1] 3 0 2 1 5 9 11 14 4 6 13 7 ## ##$colours
## [1] 5 3 2 8 6 4 7 1
##
## $red ## [1] 1 0 ## ##$green
## [1] 1 0
##
## $blue ## [1] 0 1 ## ##$gold
## [1] 1 0
##
## $white ## [1] 1 0 ## ##$black
## [1] 1 0
##
## $orange ## [1] 0 1 ## ##$mainhue
## [1] green  red    blue   gold   white  orange black  brown
## Levels: black blue brown gold green orange red white
##
## $circles ## [1] 0 1 4 2 ## ##$crosses
## [1] 0 1 2
##
## $saltires ## [1] 0 1 ## ##$quarters
## [1] 0 1 4
##
## $sunstars ## [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50 ## ##$crescent
## [1] 0 1
##
## $triangle ## [1] 0 1 ## ##$icon
## [1] 1 0
##
## $animate ## [1] 0 1 ## ##$text
## [1] 0 1
##
## $topleft ## [1] black red green blue white orange gold ## Levels: black blue gold green orange red white ## ##$botright
## [1] green  red    white  black  blue   gold   orange brown
## Levels: black blue brown gold green orange red white

Since unique_vals is a list, you can use what you’ve learned to determine the length of each element of unique_vals (i.e. the number of unique values for each variable). Simplify the result, if possible. Hint: Apply the length() function to each element of unique_vals.

sapply(unique_vals, length)
##       name   landmass       zone       area population   language
##        194          6          4        136         48         10
##   religion       bars    stripes    colours        red      green
##          8          5         12          8          2          2
##       blue       gold      white      black     orange    mainhue
##          2          2          2          2          2          8
##    circles    crosses   saltires   quarters   sunstars   crescent
##          4          3          2          3         14          2
##   triangle       icon    animate       text    topleft   botright
##          2          2          2          2          7          8

The fact that the elements of the unique_vals list are all vectors of different length poses a problem for sapply(), since there’s no obvious way of simplifying the result.

Use sapply() to apply the unique() function to each column of the flags dataset to see that you get the same unsimplified list that you got from lapply().

sapply(flags, unique)
## $name ## [1] Afghanistan Albania ## [3] Algeria American-Samoa ## [5] Andorra Angola ## [7] Anguilla Antigua-Barbuda ## [9] Argentina Argentine ## [11] Australia Austria ## [13] Bahamas Bahrain ## [15] Bangladesh Barbados ## [17] Belgium Belize ## [19] Benin Bermuda ## [21] Bhutan Bolivia ## [23] Botswana Brazil ## [25] British-Virgin-Isles Brunei ## [27] Bulgaria Burkina ## [29] Burma Burundi ## [31] Cameroon Canada ## [33] Cape-Verde-Islands Cayman-Islands ## [35] Central-African-Republic Chad ## [37] Chile China ## [39] Colombia Comorro-Islands ## [41] Congo Cook-Islands ## [43] Costa-Rica Cuba ## [45] Cyprus Czechoslovakia ## [47] Denmark Djibouti ## [49] Dominica Dominican-Republic ## [51] Ecuador Egypt ## [53] El-Salvador Equatorial-Guinea ## [55] Ethiopia Faeroes ## [57] Falklands-Malvinas Fiji ## [59] Finland France ## [61] French-Guiana French-Polynesia ## [63] Gabon Gambia ## [65] Germany-DDR Germany-FRG ## [67] Ghana Gibraltar ## [69] Greece Greenland ## [71] Grenada Guam ## [73] Guatemala Guinea ## [75] Guinea-Bissau Guyana ## [77] Haiti Honduras ## [79] Hong-Kong Hungary ## [81] Iceland India ## [83] Indonesia Iran ## [85] Iraq Ireland ## [87] Israel Italy ## [89] Ivory-Coast Jamaica ## [91] Japan Jordan ## [93] Kampuchea Kenya ## [95] Kiribati Kuwait ## [97] Laos Lebanon ## [99] Lesotho Liberia ## [101] Libya Liechtenstein ## [103] Luxembourg Malagasy ## [105] Malawi Malaysia ## [107] Maldive-Islands Mali ## [109] Malta Marianas ## [111] Mauritania Mauritius ## [113] Mexico Micronesia ## [115] Monaco Mongolia ## [117] Montserrat Morocco ## [119] Mozambique Nauru ## [121] Nepal Netherlands ## [123] Netherlands-Antilles New-Zealand ## [125] Nicaragua Niger ## [127] Nigeria Niue ## [129] North-Korea North-Yemen ## [131] Norway Oman ## [133] Pakistan Panama ## [135] Papua-New-Guinea Parguay ## [137] Peru Philippines ## [139] Poland Portugal ## [141] Puerto-Rico Qatar ## [143] Romania Rwanda ## [145] San-Marino Sao-Tome ## [147] Saudi-Arabia Senegal ## [149] Seychelles Sierra-Leone ## [151] Singapore Soloman-Islands ## [153] Somalia South-Africa ## [155] South-Korea South-Yemen ## [157] Spain Sri-Lanka ## [159] St-Helena St-Kitts-Nevis ## [161] St-Lucia St-Vincent ## [163] Sudan Surinam ## [165] Swaziland Sweden ## [167] Switzerland Syria ## [169] Taiwan Tanzania ## [171] Thailand Togo ## [173] Tonga Trinidad-Tobago ## [175] Tunisia Turkey ## [177] Turks-Cocos-Islands Tuvalu ## [179] UAE Uganda ## [181] UK Uruguay ## [183] US-Virgin-Isles USA ## [185] USSR Vanuatu ## [187] Vatican-City Venezuela ## [189] Vietnam Western-Samoa ## [191] Yugoslavia Zaire ## [193] Zambia Zimbabwe ## 194 Levels: Afghanistan Albania Algeria American-Samoa Andorra ... Zimbabwe ## ##$landmass
## [1] 5 3 4 6 1 2
##
## $zone ## [1] 1 3 2 4 ## ##$area
##   [1]   648    29  2388     0  1247  2777  7690    84    19     1   143
##  [12]    31    23   113    47  1099   600  8512     6   111   274   678
##  [23]    28   474  9976     4   623  1284   757  9561  1139     2   342
##  [34]    51   115     9   128    43    22    49   284  1001    21  1222
##  [45]    12    18   337   547    91   268    10   108   249   239   132
##  [56]  2176   109   246    36   215   112    93   103  3268  1904  1648
##  [67]   435    70   301   323    11   372    98   181   583   236    30
##  [78]  1760     3   587   118   333  1240  1031  1973  1566   447   783
##  [89]   140    41  1267   925   121   195   324   212   804    76   463
## [100]   407  1285   300   313    92   237    26  2150   196    72   637
## [111]  1221    99   288   505    66  2506    63    17   450   185   945
## [122]   514    57     5   164   781   245   178  9363 22402    15   912
## [133]   256   905   753   391
##
## $population ## [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 ## [15] 35 4 24 2 11 1008 5 47 31 54 17 61 14 684 ## [29] 157 39 57 118 13 77 12 56 18 84 48 36 22 29 ## [43] 38 49 45 231 274 60 ## ##$language
##  [1] 10  6  8  1  2  4  3  5  7  9
##
## $religion ## [1] 2 6 1 0 5 3 4 7 ## ##$bars
## [1] 0 2 3 1 5
##
## $stripes ## [1] 3 0 2 1 5 9 11 14 4 6 13 7 ## ##$colours
## [1] 5 3 2 8 6 4 7 1
##
## $red ## [1] 1 0 ## ##$green
## [1] 1 0
##
## $blue ## [1] 0 1 ## ##$gold
## [1] 1 0
##
## $white ## [1] 1 0 ## ##$black
## [1] 1 0
##
## $orange ## [1] 0 1 ## ##$mainhue
## [1] green  red    blue   gold   white  orange black  brown
## Levels: black blue brown gold green orange red white
##
## $circles ## [1] 0 1 4 2 ## ##$crosses
## [1] 0 1 2
##
## $saltires ## [1] 0 1 ## ##$quarters
## [1] 0 1 4
##
## $sunstars ## [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50 ## ##$crescent
## [1] 0 1
##
## $triangle ## [1] 0 1 ## ##$icon
## [1] 1 0
##
## $animate ## [1] 0 1 ## ##$text
## [1] 0 1
##
## $topleft ## [1] black red green blue white orange gold ## Levels: black blue gold green orange red white ## ##$botright
## [1] green  red    white  black  blue   gold   orange brown
## Levels: black blue brown gold green orange red white

Occasionally, you may need to apply a function that is not yet defined, thus requiring you to write your own. Writing functions in R is beyond the scope of this lesson, but let’s look at a quick example of how you might do so in the context of loop functions.

Pretend you are interested in only the second item from each element of the unique_vals list that you just created. Since each element of the unique_vals list is a vector and we’re not aware of any built-in function in R that returns the second element of a vector, we will construct our own function.

lapply(unique_vals, function(elem) elem[2]) will return a list containing the second item from each element of the unique_vals list. Note that our function takes one argument, elem, which is just a ‘dummy variable’ that takes on the value of each element of unique_vals, in turn.

lapply(unique_vals, function(elem) elem[2])
## $name ## [1] Albania ## 194 Levels: Afghanistan Albania Algeria American-Samoa Andorra ... Zimbabwe ## ##$landmass
## [1] 3
##
## $zone ## [1] 3 ## ##$area
## [1] 29
##
## $population ## [1] 3 ## ##$language
## [1] 6
##
## $religion ## [1] 6 ## ##$bars
## [1] 2
##
## $stripes ## [1] 0 ## ##$colours
## [1] 3
##
## $red ## [1] 0 ## ##$green
## [1] 0
##
## $blue ## [1] 1 ## ##$gold
## [1] 0
##
## $white ## [1] 0 ## ##$black
## [1] 0
##
## $orange ## [1] 1 ## ##$mainhue
## [1] red
## Levels: black blue brown gold green orange red white
##
## $circles ## [1] 1 ## ##$crosses
## [1] 1
##
## $saltires ## [1] 1 ## ##$quarters
## [1] 1
##
## $sunstars ## [1] 0 ## ##$crescent
## [1] 1
##
## $triangle ## [1] 1 ## ##$icon
## [1] 0
##
## $animate ## [1] 1 ## ##$text
## [1] 1
##
## $topleft ## [1] red ## Levels: black blue gold green orange red white ## ##$botright
## [1] red
## Levels: black blue brown gold green orange red white

The only difference between previous examples and this one is that we are defining and using our own function right in the call to lapply(). Our function has no name and disappears as soon as lapply() is done using it. So-called ‘anonymous functions’ can be very useful when one of R’s built-in functions isn’t an option.

In this lesson, you learned how to use the powerful lapply() and sapply() functions to apply an operation over the elements of a list. In the next lesson, we’ll take a look at some close relatives of lapply() and sapply().

Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.