Kruskal's Algorithm

What you'll learn

Understand the main ideas in Kruskal's Algorithm.
See a working example and explanation to help you learn by doing.
Follow a clear path to the next lesson in this topic.

When to use this lesson

Use this lesson when you want to understand the key concepts behind Kruskal's Algorithm.

❮ Previous Next ❯

Minimum Spanning Tree: Kruskal's Algorithm

Key Takeaways: Kruskal's Algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) of a graph. It works by sorting all the edges from lightest to heaviest and adding them one by one, skipping any edges that would create a cycle.

In previous tutorials, we explored the concept of a Minimum Spanning Tree (MST) and how to build one using Prim's Algorithm. While Prim's algorithm grows a tree outward from a single starting point, Kruskal's Algorithm takes a completely different, global approach.

Kruskal's algorithm doesn't care about starting points or connected boundaries. It simply looks at the entire map, grabs the absolute shortest available road anywhere, and builds the network piece by piece.

If you are building a vast fiber-optic network across a country, Kruskal's ensures you are using the absolute least amount of cable possible.

1. How Does Kruskal's Algorithm Work?

Kruskal's algorithm is one of the most elegant Greedy Algorithms in computer science. At every single step, it makes the "greediest" or cheapest possible choice, which mathematically leads to the optimal global solution.

The Three-Step Process

Sort: Take every single edge in the entire graph and sort them in ascending order based on their weight (from cheapest to most expensive).
Iterate: Go down your sorted list of edges one by one.
Check for Cycles: If adding the edge connects two isolated groups of nodes without creating a closed loop (a cycle), add it to your MST! If it creates a loop, throw it in the trash and move to the next edge.

The algorithm stops as soon as the MST contains exactly V - 1 edges (where V is the total number of vertices). A valid spanning tree connecting V nodes will always have exactly V - 1 edges.

Visual representation of Kruskal's Algorithm rejecting a cycle

Kruskal's Algorithm: Edges are added by weight. The red dashed line (weight 4) is rejected because it forms a cycle between A, B, C, and D.

2. The Secret Weapon: Disjoint Set (Union-Find)

The biggest challenge in Kruskal's algorithm is Step 3: How do we quickly know if an edge will create a cycle?

If we run a Depth-First Search (DFS) every single time we want to add an edge, our algorithm will become incredibly slow. Instead, we use a specialized data structure called a Disjoint Set (also known as a Union-Find data structure).

A Disjoint Set keeps track of which "group" or "component" a node belongs to.

It has two main operations:

Find(node): Instantly tells you which group a node belongs to.
Union(node1, node2): Merges two separate groups into one single group.

How it prevents cycles: Before Kruskal adds an edge between Node A and Node B, it calls Find(A) and Find(B).

If they return different groups, they are safe to connect! We then Union(A, B).
If they return the same group, it means they are already connected somewhere else in the tree. Adding this edge would create a cycle, so we reject it!

3. A Manual Run-Through

Let's trace Kruskal's algorithm on a graph with nodes A, B, C, D, E to see how the sorting and cycle detection work together.

Our Unsorted Edges:

A-B (2)
A-C (3)
B-C (5)
B-D (4)
C-D (1)
C-E (6)
D-E (7)

Step 1: Sort the Edges We organize the edges from lowest to highest weight:

C-D (1)
A-B (2)
A-C (3)
B-D (4)
B-C (5)
C-E (6)
D-E (7)

Step 2: Iterate and Build

Pop C-D (1): Find(C) and Find(D) are different. Safe to add! We Union them. Our forest has [C-D].
Pop A-B (2): Find(A) and Find(B) are different. Safe to add! We Union them. Our forest has [C-D] and [A-B]. (Notice we currently have two disconnected mini-trees).
Pop A-C (3): Find(A) and Find(C) are different. Safe to add! We Union them. This edge connects our two mini-trees together! All nodes A, B, C, D are now in one big group.
Pop B-D (4): Find(B) and Find(D) return the same group (they are already connected via A and C). Reject this edge to avoid a cycle!
Pop B-C (5): Find(B) and Find(C) return the same group. Reject this edge!
Pop C-E (6): Find(C) and Find(E) are different. Safe to add! Node E is now connected.
(Optimization: We now have exactly V - 1 (4) edges, so we can stop early!)

Result: The final MST edges are C-D (1), A-B (2), A-C (3), and C-E (6). Total cost: 12.

4. Implementation of Kruskal's Algorithm

Below is the complete Python implementation. We will first build a highly optimized DisjointSet class that uses Path Compression and Union by Rank to keep the Find and Union operations running in near-constant O(1) time.