This page includes an interactive code editor. Try modifying and running the examples!

Pandas Data Sorting

Data sorting is a fundamental operation in data analysis that organizes data in a specific order. Pandas provides powerful and flexible sorting capabilities through the sort_values() and sort_index() methods.

1. Basic Sorting with `sort_values()`

The primary method for sorting DataFrame rows based on column values.

Single column: df.sort_values('column_name')
Descending order: ascending=False
Multiple columns: Pass a list of column names

Parameter	Description	Default
by	Column name or list of columns to sort by	Required
ascending	Sort ascending vs. descending	True
inplace	Modify DataFrame in-place	False
na_position	Position of NaN values ('first' or 'last')	'last'
kind	Sorting algorithm ('quicksort', 'mergesort', 'heapsort')	'quicksort'

2. Multi-Column Sorting

Sort by multiple columns with different sort orders for each.

# Sort by department ascending, then salary descending
df.sort_values(['Department', 'Salary'], ascending=[True, False])

# Complex multi-level sorting
df.sort_values(['Dept', 'Age', 'Score'], ascending=[True, False, False])

3. Handling Missing Values

Control the position of NaN values during sorting.

Default: NaN values are placed at the end
na_position='first': NaN values at the beginning
Important: Consistent behavior across multiple columns

4. Index Sorting with `sort_index()`

Sort DataFrame by index labels.

Row index: df.sort_index()
Column index: df.sort_index(axis=1)
Use cases: Time series data, pre-sorted indexes

5. Custom Sorting

Advanced sorting techniques for specific requirements.

Custom order: Using pd.Categorical
Key functions: key parameter for transformations
Case-insensitive: String sorting with str.lower()

Sorting Algorithms Comparison

🚀 Quicksort (Default)

Speed: Fastest average case
Stability: Not stable
Memory: O(log n)
Best for: General purpose

🔄 Mergesort

Speed: Consistent O(n log n)
Stability: Stable
Memory: O(n)
Best for: Large datasets, stability required

📊 Heapsort

Speed: O(n log n) worst case
Stability: Not stable
Memory: O(1)
Best for: Memory-constrained environments

Practical Sorting Scenarios

📈 Business Reports

Sales by region and product
Employee performance rankings
Financial statement ordering

🔬 Data Analysis

Time series chronological order
Statistical ranking and percentiles
Data preprocessing for ML

Example: Comprehensive Data Sorting

Data Sorting Examples

import pandas as pd
import numpy as np

# Create a sample dataset for sorting examples
np.random.seed(42)
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'],
    'Age': [25, 30, 35, 28, 32, 45, 29, 31],
    'Salary': [50000, 60000, 70000, 55000, 65000, 80000, 58000, 62000],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'Marketing', 'Finance', 'HR', 'IT'],
    'Join_Date': pd.to_datetime(['2020-01-15', '2019-03-20', '2018-07-10', '2021-02-28', 
                                '2019-11-15', '2017-05-05', '2022-01-10', '2020-09-20']),
    'Performance_Score': [85, 92, 78, 88, 95, 82, 90, 87],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'London', 'Paris', 'New York']
}

df = pd.DataFrame(data)

# Add some missing values for demonstration
df.loc[2, 'Salary'] = np.nan
df.loc[5, 'Performance_Score'] = np.nan

print("Original DataFrame:")
print(df)
print("\nDataFrame Info:")
print(df.info())

# 1. BASIC SORTING
print("\n" + "="*50)
print("1. BASIC SORTING")
print("="*50)

# Sort by a single column (ascending)
df_sorted_age = df.sort_values('Age')
print("Sorted by Age (ascending):")
print(df_sorted_age[['Name', 'Age', 'Department']])

# Sort by a single column (descending)
df_sorted_salary_desc = df.sort_values('Salary', ascending=False)
print("\nSorted by Salary (descending):")
print(df_sorted_salary_desc[['Name', 'Salary', 'Department']])

# 2. MULTI-COLUMN SORTING
print("\n" + "="*50)
print("2. MULTI-COLUMN SORTING")
print("="*50)

# Sort by multiple columns
df_multi_sort = df.sort_values(['Department', 'Salary'], ascending=[True, False])
print("Sorted by Department (asc) and Salary (desc):")
print(df_multi_sort[['Name', 'Department', 'Salary']])

# Complex multi-column sorting
df_complex_sort = df.sort_values(['Department', 'Age', 'Performance_Score'], 
                                ascending=[True, False, False])
print("\nSorted by Department (asc), Age (desc), Performance_Score (desc):")
print(df_complex_sort[['Name', 'Department', 'Age', 'Performance_Score']])

# 3. SORTING WITH MISSING VALUES
print("\n" + "="*50)
print("3. HANDLING MISSING VALUES IN SORTING")
print("="*50)

# Default behavior (NaN placed at the end)
print("Default sorting with NaN (end):")
print(df.sort_values('Salary')[['Name', 'Salary']])

# Place NaN values at the beginning
df_na_first = df.sort_values('Salary', na_position='first')
print("\nNaN values first:")
print(df_na_first[['Name', 'Salary']])

# Sorting with multiple columns having NaN
df_na_multi = df.sort_values(['Salary', 'Performance_Score'], 
                            na_position='first')
print("\nMulti-column sorting with NaN first:")
print(df_na_multi[['Name', 'Salary', 'Performance_Score']])

# 4. SORTING BY INDEX
print("\n" + "="*50)
print("4. INDEX SORTING")
print("="*50)

# Create a DataFrame with custom index
df_index = df.set_index('Name')
print("DataFrame with Name as index:")
print(df_index[['Age', 'Salary']])

# Sort by index
df_sorted_index = df_index.sort_index()
print("\nSorted by index (ascending):")
print(df_sorted_index[['Age', 'Salary']])

# Sort index in descending order
df_sorted_index_desc = df_index.sort_index(ascending=False)
print("\nSorted by index (descending):")
print(df_sorted_index_desc[['Age', 'Salary']])

# Reset index after sorting
df_reset = df_sorted_index_desc.reset_index()
print("\nAfter resetting index:")
print(df_reset[['Name', 'Age', 'Salary']])

# 5. SORTING WITH CUSTOM FUNCTIONS
print("\n" + "="*50)
print("5. CUSTOM SORTING")
print("="*50)

# Custom sorting for Department (custom order)
department_order = ['IT', 'Finance', 'HR', 'Marketing']
df_custom_dept = df.copy()
df_custom_dept['Department'] = pd.Categorical(df_custom_dept['Department'], 
                                            categories=department_order, 
                                            ordered=True)
df_sorted_custom = df_custom_dept.sort_values('Department')
print("Custom department order sorting:")
print(df_sorted_custom[['Name', 'Department']])

# Sorting by string length
df_sorted_name_len = df.sort_values('Name', key=lambda x: x.str.len())
print("\nSorted by name length:")
print(df_sorted_name_len[['Name', 'City']])

# Sorting by multiple custom criteria
df_sorted_custom_multi = df.sort_values(['City', 'Name'], 
                                      key=lambda x: x.str.len() if x.name == 'Name' else x)
print("\nSorted by City and Name length:")
print(df_sorted_custom_multi[['Name', 'City']])

# 6. SORTING DATETIME DATA
print("\n" + "="*50)
print("6. DATETIME SORTING")
print("="*50)

print("Sorted by Join Date (ascending):")
df_sorted_date = df.sort_values('Join_Date')
print(df_sorted_date[['Name', 'Join_Date', 'Department']])

# Sort by date descending
df_sorted_date_desc = df.sort_values('Join_Date', ascending=False)
print("\nSorted by Join Date (descending):")
print(df_sorted_date_desc[['Name', 'Join_Date', 'Department']])

# 7. IN-PLACE SORTING
print("\n" + "="*50)
print("7. IN-PLACE SORTING")
print("="*50)

# Create a copy for in-place demonstration
df_inplace = df.copy()
print("Before in-place sorting:")
print(df_inplace[['Name', 'Age']].head(3))

# In-place sorting (modifies the original DataFrame)
df_inplace.sort_values('Age', inplace=True)
print("\nAfter in-place sorting:")
print(df_inplace[['Name', 'Age']].head(3))

# 8. SORTING WITH DIFFERENT DATA TYPES
print("\n" + "="*50)
print("8. DATA TYPE-SPECIFIC SORTING")
print("="*50)

# String sorting (case sensitivity)
df_case = pd.DataFrame({
    'Name': ['alice', 'Bob', 'charlie', 'Diana'],
    'Value': [1, 2, 3, 4]
})

print("Case-sensitive string sorting:")
df_case_sensitive = df_case.sort_values('Name')
print(df_case_sensitive)

print("\nCase-insensitive string sorting:")
df_case_insensitive = df_case.sort_values('Name', key=lambda x: x.str.lower())
print(df_case_insensitive)

# 9. PERFORMANCE CONSIDERATIONS
print("\n" + "="*50)
print("9. PERFORMANCE TIPS")
print("="*50)

# Sorting large datasets - using kind parameter
large_data = pd.DataFrame({
    'A': np.random.randint(0, 1000, 10000),
    'B': np.random.randn(10000)
})

# Different sorting algorithms
import time

# Quicksort (default)
start = time.time()
large_data.sort_values('A', kind='quicksort')
quicksort_time = time.time() - start

# Mergesort (stable)
start = time.time()
large_data.sort_values('A', kind='mergesort')
mergesort_time = time.time() - start

# Heapsort
start = time.time()
large_data.sort_values('A', kind='heapsort')
heapsort_time = time.time() - start

print(f"Quicksort time: {quicksort_time:.4f} seconds")
print(f"Mergesort time: {mergesort_time:.4f} seconds")
print(f"Heapsort time: {heapsort_time:.4f} seconds")

# 10. PRACTICAL USE CASES
print("\n" + "="*50)
print("10. PRACTICAL APPLICATIONS")
print("="*50)

# Top N performers by department
top_performers = df.sort_values(['Department', 'Performance_Score'], 
                               ascending=[True, False])
top_2_per_dept = top_performers.groupby('Department').head(2)
print("Top 2 performers by department:")
print(top_2_per_dept[['Name', 'Department', 'Performance_Score']])

# Sorting for reporting
report_df = df.sort_values(['Department', 'Salary'], ascending=[True, False])
print("\nDepartment salary report:")
print(report_df[['Name', 'Department', 'Salary', 'Join_Date']])

# Chronological analysis
chronological_df = df.sort_values('Join_Date')
print("\nEmployee chronological order:")
print(chronological_df[['Name', 'Join_Date', 'Department']])

print("\n" + "="*50)
print("SORTING SUMMARY")
print("="*50)
print(f"Original DataFrame shape: {df.shape}")
print(f"Number of sorting operations demonstrated: 15+")
print(f"Data types handled: Numeric, String, DateTime, Categorical")
print(f"Special cases covered: Missing values, Custom orders, Performance")

Sorting Methods

sort_values() - Sort by column values
sort_index() - Sort by index
nlargest() - Get top N values
nsmallest() - Get bottom N values
rank() - Assign ranks to values

Best Practices

Use in-place sorting for memory efficiency
Choose appropriate algorithm for data size
Handle missing values consistently
Consider stability for multi-level sorts
Reset index after sorting if needed

Important Considerations:

Sorting modifies the order of rows - always verify results
For large datasets, choose sorting algorithm carefully
Multi-column sorting order matters - specify ascending for each column
NaN handling can affect analysis results

Performance Tip: For better performance with large datasets, use kind='mergesort' which is more memory efficient than the default quicksort for pandas operations.

Common Sorting Patterns

# Top N pattern
top_5 = df.sort_values('Salary', ascending=False).head(5)

# Group-wise sorting
df_sorted = df.sort_values(['Group', 'Value'])

# Chronological sorting  
df_dates = df.sort_values('Date')

# Custom order sorting
custom_order = ['High', 'Medium', 'Low']
df['Priority'] = pd.Categorical(df['Priority'], categories=custom_order, ordered=True)
df_sorted = df.sort_values('Priority')

← Merging Data Handling Missing Data →

Pandas Tutorial

Pandas Data Sorting

1. Basic Sorting with sort_values()

2. Multi-Column Sorting

3. Handling Missing Values

4. Index Sorting with sort_index()

5. Custom Sorting

Sorting Algorithms Comparison

🚀 Quicksort (Default)

🔄 Mergesort

📊 Heapsort

Practical Sorting Scenarios

📈 Business Reports

🔬 Data Analysis

Example: Comprehensive Data Sorting

Data Sorting Examples

Sorting Methods

Best Practices

Common Sorting Patterns

Explore Related Tools

AI Audio Transcriber

AI Keyword Extractor

AI Language Detector

Introduction to Android & Android Studio

BMP to PNG Converter

Bootstrap 5 Admin Dashboard: Responsive Admin Template Guide

Follow Us

Our Tools

Our Company

Special Tools

1. Basic Sorting with `sort_values()`

4. Index Sorting with `sort_index()`