PostgreSQL GROUP BY

PostgreSQL GROUP BY Statement

The GROUP BY statement is fundamental for data analysis in SQL. It groups rows that have the same values into summary rows.

Think of it as creating buckets for your data. All rows with the same value in a specified column go into the same bucket.

The Role of Aggregate Functions

The GROUP BY statement is almost always used with aggregate functions. Aggregate functions perform a calculation on a set of rows and return a single value.

Common aggregate functions include:

GROUP BY tells these functions which groups of rows to calculate on.

Basic Syntax

The GROUP BY clause is placed after the WHERE clause. It comes before the ORDER BY clause if one is used.

You must list the columns you want to group by.

GROUP BY Syntax:

SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s);

A Practical Example

Imagine you want to count the number of customers in each country. You would group the data by the Country column.

Then, you would use COUNT() to tally the customers in each group.

Counting Customers by Country:

-- Count how many customers are in each country
SELECT Country, COUNT(CustomerID) AS NumberOfCustomers
FROM Customers
GROUP BY Country;

This query creates a group for each unique country. It then counts the entries within each of those country groups.

Grouping by Multiple Columns

You can group by more than one column to create more granular groups. For example, you could group by both Country and City.

This would create a summary row for each unique combination of city and country.

Grouping by Multiple Columns:

-- Count customers in each unique City/Country combination
SELECT Country, City, COUNT(CustomerID) AS NumberOfCustomers
FROM Customers
GROUP BY Country, City
ORDER BY Country;

Important Rule

When using GROUP BY, any column in your SELECT statement must be one of two things:

  1. It must be part of an aggregate function (like COUNT(), SUM()).
  2. It must be one of the columns listed in the GROUP BY clause.

Forgetting this rule is a very common source of SQL errors.

Combining with Joins

GROUP BY becomes even more powerful when combined with JOINs. You can join tables first and then group the resulting data.

This allows you to create summary reports across multiple related tables. For example, calculating total sales for each product category.

Summary

The GROUP BY statement is essential for creating summary reports. It categorizes rows into groups based on column values. It is almost always paired with an aggregate function.

Exercise

Which type of function is almost always used with a GROUP BY clause?