Data Frames are arguably the most important data structure in the entire R programming language.
They are specifically designed to store tabular data, functioning exactly like an Excel spreadsheet or SQL table!
Unlike matrices, a Data Frame can hold completely different data types across different columns!
You construct them using the data.frame() function, passing in vectors that represent each column.
Every single column must have the exact same length (number of rows), or R will throw an error.
# Creating three columns of mixed data types!
users <- data.frame(
ID = c(1, 2, 3),
Name = c("Alice", "Bob", "Charlie"),
Active = c(TRUE, FALSE, TRUE)
)
print(users)
Extracting specific columns from a massive dataset is incredibly common.
While you can use square brackets [row, column], the fastest method is using the dollar sign $.
Typing users$Name instantly extracts the entire "Name" column as a 1D vector!
# Re-creating the Data Frame for this example
users <- data.frame(
ID = c(1, 2, 3),
Name = c("Alice", "Bob", "Charlie"),
Active = c(TRUE, FALSE, TRUE)
)
# Extracting using the dollar sign
active_status <- users$Active
print(active_status)
# Extracting using brackets (all rows, column 2)
names_col <- users[, 2]
print(names_col)
Why are Data Frames considered better than Matrices for importing real-world datasets like Excel CSVs?
Which highly convenient operator is used to quickly extract a specific column from a Data Frame by its name?