PostgreSQL GROUP BY Clause

The GROUP BY clause in PostgreSQL is used to group rows that have the same values in one or more columns. It is commonly used with aggregate functions likeCOUNT, SUM, AVG, MAX, and MINto calculate summarized information.

1. Basic Syntax

SELECT column1, aggregate_function(column2)
            FROM table_name
            WHERE condition
            GROUP BY column1;

- column1 → the column(s) to group by - aggregate_function → SUM, AVG, COUNT, MAX, MIN, etc. - WHERE is optional to filter rows before grouping

2. Example: Count Rows in Groups

-- Count number of employees in each department
            SELECT department, COUNT(*) AS total_employees
            FROM employees
            GROUP BY department;

3. Using Multiple Columns in GROUP BY

You can group by more than one column to get finer-grained summaries.

-- Count employees by department and salary
            SELECT department, salary, COUNT(*) AS total
            FROM employees
            GROUP BY department, salary;

4. GROUP BY with Aggregate Functions

-- Average salary by department
            SELECT department, AVG(salary) AS avg_salary
            FROM employees
            GROUP BY department;

            -- Maximum and minimum salary per department
            SELECT department, MAX(salary) AS max_salary, MIN(salary) AS min_salary
            FROM employees
            GROUP BY department;

5. Filtering Groups with HAVING

The HAVING clause is used to filter groups after aggregation, unlike WHEREwhich filters rows before grouping.

-- Departments with more than 5 employees
            SELECT department, COUNT(*) AS total_employees
            FROM employees
            GROUP BY department
            HAVING COUNT(*) > 5;

6. GROUP BY with ORDER BY

You can sort grouped results using ORDER BY.

-- Departments by highest average salary
            SELECT department, AVG(salary) AS avg_salary
            FROM employees
            GROUP BY department
            ORDER BY avg_salary DESC;

7. GROUP BY with JOIN

You can group results after joining multiple tables.

-- Total bonuses per department
            SELECT e.department, SUM(b.bonus) AS total_bonus
            FROM employees e
            JOIN bonuses b ON e.id = b.emp_id
            GROUP BY e.department;

8. GROUPING SETS, ROLLUP, and CUBE

PostgreSQL supports advanced grouping for multiple levels of aggregation:

-- Total salary by department and overall total
            SELECT department, SUM(salary) AS total_salary
            FROM employees
            GROUP BY GROUPING SETS ((department), ()); 

            -- ROLLUP example: subtotal and grand total
            SELECT department, SUM(salary) AS total_salary
            FROM employees
            GROUP BY ROLLUP(department);

            -- CUBE example: subtotal by department and another column
            SELECT department, job_title, SUM(salary) AS total_salary
            FROM employees
            GROUP BY CUBE(department, job_title);

These features are useful for reporting and analytics, providing subtotals and totals automatically.

9. Best Practices

  • Always include in GROUP BY all non-aggregated columns in the SELECT list.
  • Use HAVING to filter aggregated results, not WHERE.
  • For large datasets, create indexes on grouping columns to improve performance.
  • Consider using ROLLUP and CUBE for advanced reporting needs.

Conclusion

The GROUP BY clause is essential for aggregating data in PostgreSQL. Combined with aggregate functions, HAVING, ORDER BY, and advanced grouping techniques like ROLLUP and CUBE, you can perform complex analytical queries efficiently. In the next tutorial, we will explore the HAVING clause in detail, which filters groups based on aggregate values.