7  Data Structure

In computer science, a data structure refers to the method which one uses to organize their data. Six basic data structures are commonly used in R:

7.1 Vectors

We can create a vector by using the “c” function to combine multiple values into a single vector. In the following example, we will combine four separate numbers into a single vector and the output the resulting vector to see what it looks like.

x <- c(1, 3, 3, 7)

print(x)
[1] 1 3 3 7

7.2 Lists

Lists are a collection of objects. This means that each element can be a different data type (unlike vectors). In the following example we’ll create a list containing two character objects and one vector with the “list” function.

first_name <- "John"
last_name <- "Smith"
favorite_numbers <- c(1, 3, 3, 7)

person <- list(first_name, last_name, favorite_numbers)

print(person)
[[1]]
[1] "John"

[[2]]
[1] "Smith"

[[3]]
[1] 1 3 3 7

7.3 Matrices

A matrix is a two-dimensional array where the data is all of the same type. In the following example, we’ll create a matrix with three rows and four columns.

x <- matrix(
        c(1,3,3,7,1,3,3,7,1,3,3,7)
        , nrow = 3
        , ncol = 4
        , byrow = TRUE)

print(x)
     [,1] [,2] [,3] [,4]
[1,]    1    3    3    7
[2,]    1    3    3    7
[3,]    1    3    3    7

7.4 Factors

Factors are used to designate levels within categorical data. In the following example, we’ll use the “factor” function on a vector of assorted color names to receive the “levels” which it contains.

x <- c("Red", "Blue", "Red", "Yellow", "Yellow")

colors <- factor(x)

print(colors)
[1] Red    Blue   Red    Yellow Yellow
Levels: Blue Red Yellow

7.5 Data Frames

A data frame contains two-dimensional data. Unlike the matrix data structure, each column of a data frame can contain data of a differing type (but within a column the data must be of the same type). The following example will create a data frame with two rows and two columns.

people <- c("John", "Jane")
id <- c(1, 2)
df <- data.frame(id = id, person = people)

print(df)
  id person
1  1   John
2  2   Jane

7.6 Arrays

Arrays are objects that can have more than two dimensions. This is sometimes referred to as being “n-dimensional”. The dimensions of the following example are 1 x 4 x 3. You’ll see that the data consist of one row and four columns spread out over a third dimension.

x <- array(
        c(1,3,3,7,1,3,3,7,1,3,3,7)
        , dim = c(1,4,3))

print(x)
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    3    3    7

, , 2

     [,1] [,2] [,3] [,4]
[1,]    1    3    3    7

, , 3

     [,1] [,2] [,3] [,4]
[1,]    1    3    3    7

7.7 Resources