PostgreSQL DISTINCT Keyword

The DISTINCT keyword in PostgreSQL is used to remove duplicate rows from query results. It is commonly used to retrieve unique values in a column or a combination of columns.

1. Basic Syntax

SELECT DISTINCT column1, column2
            FROM table_name;

- Returns only unique rows based on the columns listed. - Useful when you want to eliminate duplicates in query results.

2. Example: DISTINCT on Single Column

-- Get unique department names
            SELECT DISTINCT department
            FROM employees;

3. DISTINCT on Multiple Columns

You can apply DISTINCT on multiple columns to get unique combinations.

-- Get unique combinations of department and job title
            SELECT DISTINCT department, job_title
            FROM employees;

4. Using DISTINCT with ORDER BY

You can combine DISTINCT with ORDER BY to sort unique results.

-- Unique department names sorted alphabetically
            SELECT DISTINCT department
            FROM employees
            ORDER BY department ASC;

5. Using DISTINCT with COUNT

You can count unique values using COUNT(DISTINCT column).

-- Count number of unique departments
            SELECT COUNT(DISTINCT department) AS unique_departments
            FROM employees;

6. DISTINCT with JOINs

When joining tables, DISTINCT helps eliminate duplicate rows caused by multiple matches.

-- Unique employee names who have projects
            SELECT DISTINCT e.name
            FROM employees e
            INNER JOIN projects p ON e.id = p.emp_id;

7. DISTINCT ON

PostgreSQL also supports DISTINCT ON (column) to select the first row of each unique value based on a column, often used with ORDER BY.

-- Get first employee (alphabetically) from each department
            SELECT DISTINCT ON (department) department, name, salary
            FROM employees
            ORDER BY department, name ASC;

⚠️ DISTINCT ON is PostgreSQL-specific and requires ORDER BY to define which row is kept.

8. Best Practices

  • Use DISTINCT only when necessary, as it may slow queries on large tables.
  • Consider using GROUP BY as an alternative in some scenarios.
  • Combine with indexes for faster retrieval of unique values.
  • Be careful with joins that may introduce duplicates.

Conclusion

The DISTINCT keyword in PostgreSQL is essential for removing duplicate data and retrieving unique values efficiently. It can be applied on single or multiple columns, combined with sorting, counting, and joins. PostgreSQL also provides the powerful DISTINCT ON feature for selecting the first row of each unique value. In the next tutorial, we will explore LIMIT / TOP to control the number of rows returned by queries.