The GROUP BY statement is fundamental for data analysis in SQL.
It groups rows that have the same values into summary rows.
Think of it as creating buckets for your data. All rows with the same value in a specified column go into the same bucket.
The GROUP BY statement is almost always used with aggregate functions.
Aggregate functions perform a calculation on a set of rows and return a single value.
Common aggregate functions include:
COUNT(): Counts the number of rows.SUM(): Calculates the sum of a numeric column.AVG(): Calculates the average of a numeric column.MIN(): Finds the minimum value.MAX(): Finds the maximum value.GROUP BY tells these functions which groups of rows to calculate on.
The GROUP BY clause is placed after the WHERE clause.
It comes before the ORDER BY clause if one is used.
You must list the columns you want to group by.
SELECT column_name(s), aggregate_function(column_name) FROM table_name WHERE condition GROUP BY column_name(s);
Imagine you want to count the number of customers in each country.
You would group the data by the Country column.
Then, you would use COUNT() to tally the customers in each group.
-- Count how many customers are in each country SELECT Country, COUNT(CustomerID) AS NumberOfCustomers FROM Customers GROUP BY Country;
This query creates a group for each unique country. It then counts the entries within each of those country groups.
You can group by more than one column to create more granular groups.
For example, you could group by both Country and City.
This would create a summary row for each unique combination of city and country.
-- Count customers in each unique City/Country combination SELECT Country, City, COUNT(CustomerID) AS NumberOfCustomers FROM Customers GROUP BY Country, City ORDER BY Country;
When using GROUP BY, any column in your SELECT statement must be one of two things:
COUNT(), SUM()).GROUP BY clause.Forgetting this rule is a very common source of SQL errors.
GROUP BY becomes even more powerful when combined with JOINs.
You can join tables first and then group the resulting data.
This allows you to create summary reports across multiple related tables. For example, calculating total sales for each product category.
The GROUP BY statement is essential for creating summary reports.
It categorizes rows into groups based on column values.
It is almost always paired with an aggregate function.
Which type of function is almost always used with a GROUP BY clause?