R Factors

R Factors

Factors are highly specialized data structures used strictly for categorizing data.

They are uniquely designed to store data that has a limited, strictly predefined number of categories.


Creating a Factor

You construct a factor using the factor() function.

A classic example of categorical data is gender, t-shirt sizes, or music genres.

When you print a factor, R explicitly lists all the unique categories, known formally as "Levels"!

Creating Factors Example:

# A raw vector containing repeating text data
raw_sizes <- c("Small", "Medium", "Large", "Medium", "Small")
# Converting the raw vector into a structured Factor
size_factor <- factor(raw_sizes)
print(size_factor)
# Output will display the data AND the unique Levels: Large, Medium, Small

The levels() Function

To explicitly check exactly what unique categories exist inside a factor, use the levels() function.

If your data contains a typo (like "Smal" instead of "Small"), levels() will instantly reveal the extra, unintended category.

This makes factors an incredible tool for data cleaning and validation!

Checking Levels:

music_genres <- factor(c("Jazz", "Rock", "Pop", "Rock", "Jazz"))
# Returns exactly 3 levels: "Jazz", "Pop", "Rock"
print(levels(music_genres))

Exercise 1 of 2

?

What specific type of data are Factors heavily optimized to store?

Exercise 2 of 2

?

Which built-in function allows you to quickly list the unique categories present inside a factor?