PostgreSQL SELECT DISTINCT

PostgreSQL SELECT DISTINCT

Tables in a database often contain duplicate values across multiple rows. Sometimes, you only want to retrieve a list of the unique, distinct values.

The SELECT DISTINCT statement solves this exact problem instantly. It filters out the duplicates and returns only individual occurrences.

How DISTINCT Works

When you run a standard SELECT query, it grabs every matching row. If ten users live in "London", "London" appears ten times in the result.

By adding the DISTINCT keyword, PostgreSQL evaluates the output first. It removes the redundant rows, leaving only one instance of "London".

Basic DISTINCT Syntax:

SELECT DISTINCT column_name
FROM table_name;

Finding Unique Locations

A very common use case is finding all unique locations in a customer table. You might want to know which countries your user base spans across.

Using DISTINCT on the country column gives you a clean, simplified list. This is significantly better than reading thousands of duplicate countries.

Distinct Country Example:

-- Retrieve all unique countries from the customers table
SELECT DISTINCT Country
FROM Customers;

DISTINCT on Multiple Columns

You are not limited to using DISTINCT on just a single column. You can apply it to a combination of multiple columns simultaneously.

When applied to multiple columns, it checks the unique combination of values. Only rows where the combination of both columns is unique will be returned.

Multiple Column Example:

-- Find unique combinations of city and country
SELECT DISTINCT City, Country
FROM Customers;

Counting Distinct Values

You can pair DISTINCT with aggregate functions to gather statistics. For example, you can count the total number of distinct countries you have.

Using COUNT(DISTINCT column_name) provides this exact metric easily. This will be covered more deeply when we explore aggregate functions.

Summary

The SELECT DISTINCT statement is crucial for data cleanup and analytics. It ensures your output is concise by aggressively dropping duplicate rows.

Use it whenever you need a categorical list of unique data entries.

Exercise

What does the DISTINCT keyword do when added to a SELECT query?