Minimum Spanning Tree

Minimum Spanning Tree (MST): Kruskal's & Prim's Algorithms

Key Takeaways: A Minimum Spanning Tree (MST) is a subset of edges in a connected, weighted, undirected graph that connects all the vertices together without any cycles, while keeping the total edge weight as small as possible.

In graph theory, connecting multiple points together efficiently is one of the most common problems you will face as a software engineer.

Imagine you are a city planner tasked with laying down water pipes to connect 10 different neighborhoods. Laying pipes is expensive, so your goal is to connect every single neighborhood using the least amount of pipe possible.

This is exactly the problem that a Minimum Spanning Tree (MST) solves. In this tutorial, we will explore the core concepts of MSTs and learn the two most famous algorithms used to build them: Kruskal's Algorithm and Prim's Algorithm.


1. What is a Spanning Tree?

Before we find the Minimum Spanning Tree, we need to understand what a regular Spanning Tree is.

Given a connected, undirected graph, a Spanning Tree is a subgraph that satisfies two strict rules:

  1. It includes every single vertex (node) from the original graph.
  2. It contains no cycles (loops). If there are V vertices, a valid spanning tree will always have exactly V - 1 edges.

A single graph can have many different valid spanning trees. However, if the graph is weighted (edges have costs), these different spanning trees will have different total costs.

The Minimum Spanning Tree (MST) is simply the spanning tree that has the lowest possible total edge weight!

Real-World Applications of MST

Why do we care about finding the MST? It has massive implications for infrastructure and technology:

Visual representation of a Minimum Spanning Tree highlighting optimal paths

The thick red lines represent the Minimum Spanning Tree. All nodes are connected, no cycles are formed, and the total cost is minimized.


2. Kruskal's Algorithm

Kruskal's Algorithm is a beautiful, greedy algorithm that builds the MST edge by edge.

It takes a very logical approach: Sort all the edges from cheapest to most expensive. Keep picking the cheapest edge available, as long as it doesn't create a cycle.

The Step-by-Step Logic

  1. Take every edge in the graph and sort them in ascending order of their weight.
  2. Initialize an empty forest (a collection of separate, disconnected trees).
  3. Iterate through the sorted edges one by one:
    • If adding the edge connects two separate trees without forming a cycle, add it to the MST.
    • If adding the edge connects two nodes that are already part of the same tree (which would create a cycle), discard it.
  4. Stop when you have successfully added V - 1 edges!

The Secret Weapon: Disjoint Set (Union-Find)

How does Kruskal's algorithm efficiently know if an edge will create a cycle? It uses a clever data structure called a Disjoint Set (or Union-Find). This structure allows us to instantly check if two nodes belong to the same "group" before we connect them.

Kruskal's Algorithm (Python)

class DisjointSet:
    def __init__(self, vertices):
        self.parent = {v: v for v in vertices}
        self.rank = {v: 0 for v in vertices}
    def find(self, item):
        if self.parent[item] == item:
            return item
        # Path compression for blazing fast lookups
        self.parent[item] = self.find(self.parent[item])
        return self.parent[item]
    def union(self, set1, set2):
        root1 = self.find(set1)
        root2 = self.find(set2)
        if root1 != root2:
            # Attach smaller rank tree under higher rank tree
            if self.rank[root1] > self.rank[root2]:
                self.parent[root2] = root1
            elif self.rank[root1] < self.rank[root2]:
                self.parent[root1] = root2
            else:
                self.parent[root2] = root1
                self.rank[root1] += 1

def kruskal(vertices, edges): mst = [] # 1. Sort edges by weight (Index 2 holds the weight) edges.sort(key=lambda x: x[2]) ds = DisjointSet(vertices) for u, v, weight in edges: # 2. If they are in different sets, it's safe to connect! if ds.find(u) != ds.find(v): ds.union(u, v) mst.append((u, v, weight)) if len(mst) == len(vertices) - 1: break return mst

# Define Graph vertices = ['A', 'B', 'C', 'D', 'E'] edges = [ ('A', 'B', 4), ('A', 'C', 2), ('B', 'C', 5), ('B', 'D', 10), ('C', 'D', 1), ('C', 'E', 8), ('D', 'E', 3) ]

mst = kruskal(vertices, edges) print("Kruskal's MST Edges:", mst) print("Total Cost:", sum(weight for _, _, weight in mst))


3. Prim's Algorithm

While Kruskal looks at all the edges globally, Prim's Algorithm takes a local, organic approach. It starts from an arbitrary node and "grows" the spanning tree outward, one node at a time.

Prim's algorithm behaves almost exactly like Dijkstra's Algorithm for Shortest Paths!

The Step-by-Step Logic

  1. Start by picking any random vertex. Mark it as part of the MST.
  2. Look at all the outgoing edges connected to the current MST.
  3. Pick the absolute cheapest edge that connects to an unvisited node.
  4. Add that edge and the new node to the MST.
  5. Repeat this process until every single node is part of the MST.

Because Prim's constantly needs to find the "cheapest available edge", we use a Priority Queue (Min-Heap) to pull the minimum weight efficiently.

Prim's Algorithm (Python)

import heapq

def prims(graph, start_node): mst = [] visited = set([start_node]) # Priority Queue holds tuples of (weight, from_node, to_node) pq = [(weight, start_node, neighbor) for neighbor, weight in graph[start_node]] heapq.heapify(pq) total_cost = 0 while pq and len(visited) < len(graph): weight, u, v = heapq.heappop(pq) # If the target node is not visited yet, it's a valid extension! if v not in visited: visited.add(v) mst.append((u, v, weight)) total_cost += weight # Add all outgoing edges from the newly visited node for next_neighbor, next_weight in graph[v]: if next_neighbor not in visited: heapq.heappush(pq, (next_weight, v, next_neighbor)) return mst, total_cost

# Graph represented as an Adjacency List graph = { 'A': [('B', 4), ('C', 2)], 'B': [('A', 4), ('C', 5), ('D', 10)], 'C': [('A', 2), ('B', 5), ('D', 1), ('E', 8)], 'D': [('B', 10), ('C', 1), ('E', 3)], 'E': [('C', 8), ('D', 3)] }

mst, cost = prims(graph, 'A') print("Prim's MST Edges:", mst) print("Total Cost:", cost)


4. Complexity & Which One to Choose?

Both algorithms effectively achieve the exact same goal, but their speeds scale differently based on the structure of the graph.

Both algorithms have a Space Complexity of O(V + E).


Exercise 1 of 2

?

What is a fundamental property of any valid Minimum Spanning Tree (MST)?

Exercise 2 of 2

?

Which data structure does Kruskal's Algorithm heavily rely on to detect cycles?