Factors are highly specialized data structures used strictly for categorizing data.
They are uniquely designed to store data that has a limited, strictly predefined number of categories.
You construct a factor using the factor() function.
A classic example of categorical data is gender, t-shirt sizes, or music genres.
When you print a factor, R explicitly lists all the unique categories, known formally as "Levels"!
# A raw vector containing repeating text data
raw_sizes <- c("Small", "Medium", "Large", "Medium", "Small")
# Converting the raw vector into a structured Factor
size_factor <- factor(raw_sizes)
print(size_factor)
# Output will display the data AND the unique Levels: Large, Medium, Small
To explicitly check exactly what unique categories exist inside a factor, use the levels() function.
If your data contains a typo (like "Smal" instead of "Small"), levels() will instantly reveal the extra, unintended category.
This makes factors an incredible tool for data cleaning and validation!
music_genres <- factor(c("Jazz", "Rock", "Pop", "Rock", "Jazz"))
# Returns exactly 3 levels: "Jazz", "Pop", "Rock"
print(levels(music_genres))
What specific type of data are Factors heavily optimized to store?
Which built-in function allows you to quickly list the unique categories present inside a factor?