# Matrices and Dataframes

Both are two-dimensional data structures.

Matrices are atomic vectors with dimensions and can only have one mode (type) of data.

Dataframes can have columns of different data modes.

# Matrices

Matrices are an extension of numeric or character vectors.

They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns.

Because they are an atomic vector, all the elements are the same type (integer, or numeric, or, etc).

Matrices are used frequently in mathematical models and statistics.

## How to make a matrix

We can create an empty matrix and fill it later, with `matrix()`.

``````matrix(ncol = 2, nrow = 2)
``````
``````     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
``````

or create one from vector (here we fill in the optional `data = ` argument within `matrix()`).

``````matrix(1:4, ncol = 2, nrow = 2)
``````
``````     [,1] [,2]
[1,]    1    3
[2,]    2    4
``````

By default, matrices are filled by columns. We can fill be rows instead.

``````matrix(1:4, ncol = 2, nrow = 2, byrow = TRUE)
``````
``````     [,1] [,2]
[1,]    1    2
[2,]    3    4
``````

## How to label a matrix

We can label the columns, with `colnames()`

``````m <- matrix(1:4, ncol = 2, nrow = 2)
colnames(m) <- c("A", "B")
m
``````
``````     A B
[1,] 1 3
[2,] 2 4
``````

… and the rows, with `rownames()`

``````rownames(m) <- c("a", "b")
m
``````
``````  A B
a 1 3
b 2 4
``````

## How to add to a matrix

We can add extra columns, with `cbind()`,

``````cbind(m, 5:6)
``````
``````  A B
a 1 3 5
b 2 4 6
``````

or rows, with `rbind()`.

``````rbind(m, 7:8)
``````
``````  A B
a 1 3
b 2 4
7 8
``````

## Working with matrices

We can calculate the sum of each row, with `rowSums()`.

``````rowSums(m)
a b
``````
``````4 6
``````

And the sum of each column, with `colSums()`.

``````colSums(m)
``````
``````A B
3 7
``````

We can transpose the entire matrix, with `t()`.

`````` t(m)
``````
``````  a b
A 1 2
B 3 4
``````

# Dataframes

Dataframes are also two-dimensional objects. But, unlike matrices, dataframes can include columns of any data type.

Techically, data frames are a type of list where every part (i.e., column) of the list has same length. A data frame is therefore a rectangular list.

Data frames area akin to the usual “observations and variables” model that most statistical software uses. They are also similar to database tables.

So, each row of a data frame should represent an observation and the elements in a given row represent information about that observation. Each column has all the information about a particular variable for the entire data set.

Dataframes are most likely what you will use to work with your own data.

Each column can only be one mode or data type.

Each column is the same length.

## How to make a dataframe

Can be created with `data.frame()`. This works ok for small datasets.

``````df <- data.frame(sample_id = c('i', 'ii', 'iii', 'iv'), x = c(1,2,3,4), y = c(5, 6, 7, 8))
df
``````
``````  sample_id x y
1         i 1 5
2        ii 2 6
3       iii 3 7
4        iv 4 8
``````

For larger dataset, more usually you will read in external data using `read.table()` or `read.csv()` (See Importing Data)

## How to label a dataframe

We can label (specific) columns, with `colnames()`

``````colnames(df)[2:3] <- c('x1', 'y2')
df
``````
``````  sample_id x1 y2
1         i  1  5
2        ii  2  6
3       iii  3  7
4        iv  4  8
``````

… and the rows with `rownames()`

``````rownames(df) <- c('row1', 'row2', 'row3', 'row4')
df
``````
``````     sample_id x1 y2
row1         i  1  5
row2        ii  2  6
row3       iii  3  7
row4        iv  4  8
``````

## How to add to a dataframe

We can add columns using `cbind()` and rows using `rbind()`, as for matrices.

``````z <- c(9, 10, 11, 12)
df <- cbind(df, z)
df
``````
``````     sample_id x1 y2  z
row1         i  1  5  9
row2        ii  2  6 10
row3       iii  3  7 11
row4        iv  4  8 12
``````

We can also create new columns using the ` \$ ` and new column name.

``````df\$col4 <- c(9, 10, 11, 12)
df
``````
``````     sample_id x1 y2  z col4
row1         i  1  5  9    9
row2        ii  2  6 10   10
row3       iii  3  7 11   11
row4        iv  4  8 12   12
``````

## Examining dataframes and matrices

We can check if it is a dataframe, with `is.data.frame()`

``````is.data.frame(df)
``````
`````` TRUE
``````

And confirm that is it, actually, a list.

``````is.list(df)
``````
`````` TRUE
``````

We can check the dimenions with `dim()`.

``````dim(df)
``````
``````4 5
``````

And ask how many columns with `ncol()`,

``````ncol(df)
``````
`````` 5
``````

and rows with `nrow()`.

``````nrow(df)
``````
`````` 4
``````

We can check the structure, with `str()`.

``````str(df)
``````
``````'data.frame':	4 obs. of  5 variables:
\$ sample_id: Factor w/ 4 levels "i","ii","iii",..: 1 2 3 4
\$ x1       : num  1 2 3 4
\$ y2       : num  5 6 7 8
\$ z        : num  9 10 11 12
\$ col4     : num  9 10 11 12
``````

We can examine the first few rows (default is 6) with `head()`.

``````head(df, n = 2)
``````
``````     sample_id x1 y2  z col4
row1         i  1  5  9    9
row2        ii  2  6 10   10
``````

And the last few rows with `tail()`.

``````tail(df, n = 3)
``````
``````     sample_id x1 y2  z col4
row2        ii  2  6 10   10
row3       iii  3  7 11   11
row4        iv  4  8 12   12
``````