Hash Sets

Hash Sets

Key Takeaways: A Hash Set is exactly like a Hash Table, but it only stores Keys without any Values. Its main superpower is instantly preventing and removing duplicate items from a dataset.

In the previous tutorial, we learned about Hash Tables (Dictionaries). Hash Tables use a Key to look up a Value.

But what if you don't care about a "Value"? What if you only want to keep track of a list of items and make sure that list has no duplicates? That is where the Hash Set comes in!


1. What is a Hash Set?

A Hash Set is an unordered collection of unique items.

Under the hood, a Hash Set uses the exact same Hash Function magic as a Hash Table. The difference is that it only stores the Keys.

Real-Life Analogy

Think of a VIP Guest List at an exclusive club.

If John tries to add his name to the guest list a second time, the bouncer says, "You are already on the list!" and ignores him.


2. How it Filters Duplicates

Because a Hash Set uses a Hash Function, it instantly knows if an item already exists.

  1. You try to add "Apple" to the Set.
  2. The Hash Function calculates Index 0. Since Index 0 is empty, it adds "Apple".
  3. Later, you try to add "Apple" again.
  4. The Hash Function calculates Index 0. It sees that "Apple" is already there!
  5. The Hash Set quietly ignores the second "Apple".
Visual representation of a Hash Set filtering out duplicate items

Hash Set: Duplicates are mathematically mapped to the same index and rejected.


3. When to Use a Hash Set

Hash Sets are the ultimate tool for specific algorithmic problems. You should use them when:


4. Time Complexity

Because Hash Sets use Hash Functions, their speeds are identical to Hash Tables!

Warning: Because Hash Sets are based on mathematical hashing, they are unordered. If you print a Set, the items will not be in the order you added them!


5. Code Example (Python Sets)

Python has a built-in set data structure that is incredibly easy to use. Let's write some code to see how it automatically destroys duplicate values!

Hash Sets in Python

# 1. Create an empty Set
vip_list = set()
# 2. ADD items (O(1) time)
vip_list.add("Alice")
vip_list.add("Bob")
print("Current List:", vip_list)
# 3. Try to add a duplicate!
vip_list.add("Alice")
# Alice is still only in there once!
print("After duplicate:", vip_list)
# 4. CHECK IF EXISTS (O(1) time)
if "Bob" in vip_list:
    print("Bob is allowed inside!")
else:
    print("Bob is not on the list.")
# 5. Magic Trick: Remove duplicates from a List instantly
messy_list = [1, 2, 2, 3, 3, 3, 4]
clean_list = list(set(messy_list))
print("Cleaned Array:", clean_list)

Exercise 1 of 1

?

What happens if you try to add a value to a Hash Set that already exists inside it?