PostgreSQL DISTINCT Keyword
The DISTINCT keyword in PostgreSQL is used to remove duplicate rows from query results. It is commonly used to retrieve unique values in a column or a combination of columns.
1. Basic Syntax
SELECT DISTINCT column1, column2
FROM table_name;- Returns only unique rows based on the columns listed. - Useful when you want to eliminate duplicates in query results.
2. Example: DISTINCT on Single Column
-- Get unique department names
SELECT DISTINCT department
FROM employees;3. DISTINCT on Multiple Columns
You can apply DISTINCT on multiple columns to get unique combinations.
-- Get unique combinations of department and job title
SELECT DISTINCT department, job_title
FROM employees;4. Using DISTINCT with ORDER BY
You can combine DISTINCT with ORDER BY to sort unique results.
-- Unique department names sorted alphabetically
SELECT DISTINCT department
FROM employees
ORDER BY department ASC;5. Using DISTINCT with COUNT
You can count unique values using COUNT(DISTINCT column).
-- Count number of unique departments
SELECT COUNT(DISTINCT department) AS unique_departments
FROM employees;6. DISTINCT with JOINs
When joining tables, DISTINCT helps eliminate duplicate rows caused by multiple matches.
-- Unique employee names who have projects
SELECT DISTINCT e.name
FROM employees e
INNER JOIN projects p ON e.id = p.emp_id;7. DISTINCT ON
PostgreSQL also supports DISTINCT ON (column) to select the first row of each unique value based on a column, often used with ORDER BY.
-- Get first employee (alphabetically) from each department
SELECT DISTINCT ON (department) department, name, salary
FROM employees
ORDER BY department, name ASC;⚠️ DISTINCT ON is PostgreSQL-specific and requires ORDER BY to define which row is kept.
8. Best Practices
- Use
DISTINCTonly when necessary, as it may slow queries on large tables. - Consider using
GROUP BYas an alternative in some scenarios. - Combine with indexes for faster retrieval of unique values.
- Be careful with joins that may introduce duplicates.
Conclusion
The DISTINCT keyword in PostgreSQL is essential for removing duplicate data and retrieving unique values efficiently. It can be applied on single or multiple columns, combined with sorting, counting, and joins. PostgreSQL also provides the powerful DISTINCT ON feature for selecting the first row of each unique value. In the next tutorial, we will explore LIMIT / TOP to control the number of rows returned by queries.