DSA Traveling Salesman

What you'll learn

Understand the main ideas in DSA Traveling Salesman.
See a working example and explanation to help you learn by doing.
Follow a clear path to the next lesson in this topic.

When to use this lesson

Use this lesson when you want to understand the key concepts behind DSA Traveling Salesman.

❮ Previous Next ❯

Traveling Salesman Problem (TSP) in Data Structures and Algorithms

The Traveling Salesman Problem (TSP) is one of the most famous and widely studied problems in computer science and operations research. It is a classic algorithmic problem focused on optimization and falls under the category of NP-Hard problems.

What is the Traveling Salesman Problem?

Imagine a salesman who needs to visit a set of cities to sell their goods. The rules are simple:

The salesman must visit every city exactly once.
The salesman must return to the starting city at the end of the journey.
The goal is to find the shortest possible route (or the route with the minimum total cost/distance) that satisfies the above conditions.

This sounds easy when there are only a few cities, but as the number of cities increases, the number of possible routes grows factorially, making it incredibly difficult to solve using a brute-force approach.

Why is TSP Important?

The Traveling Salesman Problem isn't just about salesmen traveling between cities. It has numerous real-world applications in various industries:

Logistics and Delivery: Optimizing delivery routes for companies like Amazon, FedEx, and UPS to save fuel and time.
Manufacturing: Planning the movements of a drill press to create holes in printed circuit boards efficiently.
DNA Sequencing: Reconstructing a DNA sequence from its fragments.
Microchip Fabrication: Minimizing the time taken by a laser to cut elements on a semiconductor chip.

Understanding the Problem with a Graph

In Data Structures and Algorithms, we represent the TSP using a Graph.

Vertices (Nodes): Represent the cities.
Edges (Links): Represent the paths between the cities.
Weights: Represent the distance, time, or cost to travel between two cities.

The objective is to find the shortest Hamiltonian Cycle in a complete weighted graph. A Hamiltonian cycle is a closed loop that visits every vertex exactly once and returns to the starting vertex.

Visualizing the TSP

Below is a visual representation of a simple Traveling Salesman Problem with 4 cities (A, B, C, D) and the varying distances between them. The green path represents the optimal shortest route.

In the graph on the left, calculating the distances for possible paths from City A:

A -> B -> C -> D -> A = 10 + 25 + 30 + 20 = 85 (Optimal)
A -> C -> B -> D -> A = 15 + 25 + 35 + 20 = 95
A -> B -> D -> C -> A = 10 + 35 + 30 + 15 = 90

Approaches to Solving the Traveling Salesman Problem

Since TSP is NP-Hard, there is no known algorithm to solve it exactly in polynomial time. However, there are several approaches we can take depending on the exact requirements.

1. Brute Force (Naive Approach)

The most straightforward way to solve the TSP is to calculate the total distance for every possible route and then select the shortest one.

How it works: Generate all permutations of the cities. For N cities, there are (N-1)! possible routes.
Time Complexity: O(N!) - Factorial time complexity.
Pros: Guaranteed to find the exact optimal, shortest path.
Cons: Extremely slow and completely impractical for more than 15-20 cities.

2. Dynamic Programming (Held-Karp Algorithm)

We can optimize the brute force approach using Dynamic Programming. Instead of recalculating overlapping subproblems, we store the results of previously computed paths.

How it works: Uses a bitmask to represent the set of visited cities and a 2D array to memorize the shortest path to reach a specific city having visited a subset of cities.
Time Complexity: O(N^2 * 2^N)
Space Complexity: O(N * 2^N)
Pros: Much faster than brute force for smaller inputs. Can solve exactly for up to 20-25 cities.
Cons: Exponential memory consumption makes it unfeasible for larger sets of cities.

3. Branch and Bound

Branch and Bound is an algorithm design paradigm for discrete and combinatorial optimization problems.

How it works: It systematically enumerates candidate solutions by organizing them in a tree structure (branching). It then calculates a lower bound on the cost for each branch and prunes branches that cannot yield a better solution than the best one found so far (bounding).
Pros: Often faster than Dynamic Programming in practice for average cases, though the worst-case time complexity remains exponential.

4. Approximate Algorithms (Heuristics)

When we need to solve the TSP for hundreds or thousands of cities, finding the exact optimal path is impossible in a reasonable timeframe. Instead, we use approximation algorithms to find a "good enough" solution quickly.

Nearest Neighbor: Start at a random city and repeatedly visit the nearest unvisited city. Fast (O(N^2)) but often yields suboptimal routes.
Minimum Spanning Tree (MST) Heuristic: Create an MST of the graph and perform a pre-order traversal. This guarantees a path no longer than twice the optimal path (2-approximation).
Genetic Algorithms: Uses evolutionary concepts like mutation, crossover, and selection to iteratively improve a population of routes.
Simulated Annealing: A probabilistic technique that occasionally accepts worse solutions to escape local minima, eventually converging on a strong global solution.

Code Example: Dynamic Programming Implementation

Here is a Python implementation of the Traveling Salesman Problem using the Dynamic Programming (Memoization) approach with Bitmasking.

TSP using Dynamic Programming (Python)

import sys

# Number of cities
n = 4
# Distance matrix (Adjacency matrix) matching our SVG
dist = [
    [0, 10, 15, 20],  # Distances from A to A, B, C, D
    [10, 0, 25, 35],  # Distances from B to A, B, C, D
    [15, 25, 0, 30],  # Distances from C to A, B, C, D
    [20, 35, 30, 0]   # Distances from D to A, B, C, D
]
# Memoization table
# Initialize with -1. Size is (2^n) x n
VISITED_ALL = (1 << n) - 1
memo = [[-1] * n for _ in range(1 << n)]
def tsp(mask, pos):
    # Base case: If all cities have been visited
    if mask == VISITED_ALL:
        # Return distance from current city back to starting city (0)
        return dist[pos][0] 
    # If this subproblem has already been solved, return memoized result
    if memo[mask][pos] != -1:
        return memo[mask][pos]
    ans = sys.maxsize
    # Try to visit all unvisited cities
    for city in range(n):
        # Check if the city is unvisited by checking its bit in the mask
        if (mask & (1 << city)) == 0:
            # Recursively calculate the cost of visiting this city
            new_ans = dist[pos][city] + tsp(mask | (1 << city), city)
            ans = min(ans, new_ans)
    # Store the result in memoization table and return
    memo[mask][pos] = ans
    return ans
# Start from city 0 (A), so mask is 1 (binary 0001) and pos is 0
shortest_path = tsp(1, 0)
print(f"The minimum cost of the tour is: {shortest_path}")

import sys

n = 4

dist = [
    [0, 10, 15, 20],
    [10, 0, 25, 35],
    [15, 25, 0, 30],
    [20, 35, 30, 0]
]

VISITED_ALL = (1 << n) - 1
memo = [[-1] * n for _ in range(1 << n)]

def tsp(mask, pos):
    if mask == VISITED_ALL:
        return dist[pos][0] 
        
    if memo[mask][pos] != -1:
        return memo[mask][pos]
        
    ans = sys.maxsize
    
    for city in range(n):
        if (mask & (1 << city)) == 0:
            new_ans = dist[pos][city] + tsp(mask | (1 << city), city)
            ans = min(ans, new_ans)
            
    memo[mask][pos] = ans
    return ans

shortest_path = tsp(1, 0)
print(f"The minimum cost of the tour is: {shortest_path}")

Breaking Down the Code:

Bitmasking: We use an integer mask to keep track of which cities have been visited. The i-th bit of mask is 1 if the i-th city has been visited, and 0 otherwise.
State Representation: The state is represented by (mask, pos), where mask indicates the visited cities and pos is the current city.
Memoization: memo[mask][pos] stores the minimum distance needed to visit the remaining unvisited cities starting from pos.
Base Case: When mask == VISITED_ALL (all bits are 1), we return the distance from the current city back to the starting city (dist[pos][0]).

Summary and Key Takeaways

The Traveling Salesman Problem (TSP) asks for the shortest possible route that visits every node in a graph exactly once and returns to the origin.
It is an NP-Hard problem, meaning no polynomial-time algorithm is known to exist for solving it exactly.
Brute Force (O(N!)) is too slow for almost any practical use beyond a handful of cities.
Dynamic Programming (O(N^2 * 2^N)) is significantly better but is constrained by its exponential memory consumption.
For real-world applications with many nodes, we rely on Heuristics and Approximation Algorithms like the Nearest Neighbor heuristic, Genetic Algorithms, or Simulated Annealing to find near-optimal solutions in a fraction of the time.

Mastering the TSP is a right of passage for computer science students. It offers profound insights into computational complexity, graph theory, and modern algorithmic optimization techniques!

Exercise

What is the time complexity of the naive Brute Force approach to solving the Traveling Salesman Problem?

O(N^2) O(N!) O(2^N) O(N log N)

❮ Previous Next ❯

DSA Basics

Arrays

Linked Lists

Stacks & Queues

Hash Tables

Trees

Graphs

Shortestpath

Minimum spanning tree

DSA Advanced