Kruskal's Algorithm

Minimum Spanning Tree: Kruskal's Algorithm

Key Takeaways: Kruskal's Algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) of a graph. It works by sorting all the edges from lightest to heaviest and adding them one by one, skipping any edges that would create a cycle.

In previous tutorials, we explored the concept of a Minimum Spanning Tree (MST) and how to build one using Prim's Algorithm. While Prim's algorithm grows a tree outward from a single starting point, Kruskal's Algorithm takes a completely different, global approach.

Kruskal's algorithm doesn't care about starting points or connected boundaries. It simply looks at the entire map, grabs the absolute shortest available road anywhere, and builds the network piece by piece.

If you are building a vast fiber-optic network across a country, Kruskal's ensures you are using the absolute least amount of cable possible.


1. How Does Kruskal's Algorithm Work?

Kruskal's algorithm is one of the most elegant Greedy Algorithms in computer science. At every single step, it makes the "greediest" or cheapest possible choice, which mathematically leads to the optimal global solution.

The Three-Step Process

  1. Sort: Take every single edge in the entire graph and sort them in ascending order based on their weight (from cheapest to most expensive).
  2. Iterate: Go down your sorted list of edges one by one.
  3. Check for Cycles: If adding the edge connects two isolated groups of nodes without creating a closed loop (a cycle), add it to your MST! If it creates a loop, throw it in the trash and move to the next edge.

The algorithm stops as soon as the MST contains exactly V - 1 edges (where V is the total number of vertices). A valid spanning tree connecting V nodes will always have exactly V - 1 edges.

Visual representation of Kruskal's Algorithm rejecting a cycle

Kruskal's Algorithm: Edges are added by weight. The red dashed line (weight 4) is rejected because it forms a cycle between A, B, C, and D.


2. The Secret Weapon: Disjoint Set (Union-Find)

The biggest challenge in Kruskal's algorithm is Step 3: How do we quickly know if an edge will create a cycle?

If we run a Depth-First Search (DFS) every single time we want to add an edge, our algorithm will become incredibly slow. Instead, we use a specialized data structure called a Disjoint Set (also known as a Union-Find data structure).

A Disjoint Set keeps track of which "group" or "component" a node belongs to.

It has two main operations:

How it prevents cycles: Before Kruskal adds an edge between Node A and Node B, it calls Find(A) and Find(B).


3. A Manual Run-Through

Let's trace Kruskal's algorithm on a graph with nodes A, B, C, D, E to see how the sorting and cycle detection work together.

Our Unsorted Edges:

Step 1: Sort the Edges We organize the edges from lowest to highest weight:

  1. C-D (1)
  2. A-B (2)
  3. A-C (3)
  4. B-D (4)
  5. B-C (5)
  6. C-E (6)
  7. D-E (7)

Step 2: Iterate and Build

Result: The final MST edges are C-D (1), A-B (2), A-C (3), and C-E (6). Total cost: 12.


4. Implementation of Kruskal's Algorithm

Below is the complete Python implementation. We will first build a highly optimized DisjointSet class that uses Path Compression and Union by Rank to keep the Find and Union operations running in near-constant O(1) time.

Kruskal's Algorithm (Python)

class DisjointSet:
    def __init__(self, vertices):
        # Initially, every node is its own parent (its own group)
        self.parent = {v: v for v in vertices}
        self.rank = {v: 0 for v in vertices}
    def find(self, item):
        # Path compression: Attach nodes directly to the root
        if self.parent[item] == item:
            return item
        self.parent[item] = self.find(self.parent[item])
        return self.parent[item]
    def union(self, set1, set2):
        root1 = self.find(set1)
        root2 = self.find(set2)
        if root1 != root2:
            # Union by Rank: Keep the tree shallow by attaching
            # the smaller tree under the taller tree
            if self.rank[root1] > self.rank[root2]:
                self.parent[root2] = root1
            elif self.rank[root1] < self.rank[root2]:
                self.parent[root1] = root2
            else:
                self.parent[root2] = root1
                self.rank[root1] += 1

def kruskals_algorithm(vertices, edges): mst = [] total_cost = 0 # 1. Sort all edges by their weight (index 2 of the tuple) edges.sort(key=lambda x: x[2]) ds = DisjointSet(vertices) # 2. Iterate through the sorted edges for u, v, weight in edges: # 3. Check for cycles using Find if ds.find(u) != ds.find(v): # Safe to add! ds.union(u, v) mst.append((u, v, weight)) total_cost += weight # Optimization: Stop early if we have V-1 edges if len(mst) == len(vertices) - 1: break return mst, total_cost

# Define the Graph vertices = ['A', 'B', 'C', 'D'] edges = [ ('A', 'D', 5), ('A', 'B', 1), ('C', 'D', 2), ('B', 'C', 4), ('A', 'C', 3) ]

mst_result, min_cost = kruskals_algorithm(vertices, edges) print("Edges in MST:", mst_result) print("Total Cost:", min_cost)


5. Complexity Analysis

Kruskal's algorithm performance is heavily tied to the fact that it must sort the edges before it can do anything else.

When to Use Kruskal's vs. Prim's?

Algorithm Best Use Case Why?
Kruskal's Algorithm Sparse Graphs (Very few edges) Sorting fewer edges is incredibly fast. Kruskal jumps around the map effortlessly to build isolated forests before connecting them.
Prim's Algorithm Dense Graphs (Massive numbers of edges) Prim's is bounded by vertices. It avoids sorting millions of edges globally and just picks the best local edge organically using a Min-Heap.

Exercise 1 of 2

?

Which specialized data structure allows Kruskal's Algorithm to check for cycles in near-constant O(1) time?

Exercise 2 of 2

?

What is the very first step Kruskal's algorithm takes before it begins adding edges to the tree?