This page includes an interactive code editor. Try modifying and running the examples!

Pandas GroupBy Operations

The GroupBy operation is one of the most powerful features in Pandas, enabling you to split data into groups, apply functions to each group, and combine the results. It's essential for aggregation, transformation, and filtering operations.

1. The GroupBy Process: Split-Apply-Combine

GroupBy follows a three-step process:

  • Split: Divide the data into groups based on specified criteria
  • Apply: Apply a function to each group independently
  • Combine: Combine the results into a new data structure
Visualization: DataFrame → Split by Group → Apply Function → Combine Results

2. Basic GroupBy Syntax and Methods

MethodDescriptionExample
groupby()Create GroupBy objectdf.groupby('column')
sum()Sum of each groupgrouped.sum()
mean()Average of each groupgrouped.mean()
count()Count of elementsgrouped.count()
agg()Multiple aggregationsgrouped.agg(['sum', 'mean'])

3. Common Aggregation Functions

  • sum() - Sum of values
  • mean() - Arithmetic mean
  • median() - Median value
  • std() - Standard deviation
  • var() - Variance
  • min() - Minimum value
  • max() - Maximum value
  • count() - Count of non-NA values
  • size() - Size of group
  • first()/last() - First/last value

Basic GroupBy Examples

Basic GroupBy Operations

4. Advanced GroupBy Techniques

TechniqueDescriptionUse Case
transform()Return object with group values broadcastedAdding group-wise statistics to original data
filter()Filter groups based on conditionsSelecting groups that meet specific criteria
apply()Apply custom function to each groupComplex group-wise operations
Multiple ColumnsGroup by multiple columnsHierarchical grouping analysis

Advanced GroupBy Operations

Advanced GroupBy Techniques

Real-World E-commerce Example

Real-World GroupBy Application
GroupBy Best Practices
  • Use specific columns instead of entire DataFrame
  • Chain operations for better performance
  • Use agg() for multiple aggregations
  • Consider using pd.pivot_table() for simple cases
  • Reset index after grouping for cleaner DataFrames
Performance Tips
  • Avoid using apply() when built-in methods exist
  • Use categorical data for grouping columns
  • Sort data before grouping if needed
  • Use as_index=False to keep grouping columns as regular columns
  • Consider Dask for very large datasets
Important: Remember that GroupBy operations are lazy - they don't compute until you apply an aggregation function. This allows for efficient chaining of operations.
Pro Tip: Use .reset_index() after GroupBy operations to convert the result back to a regular DataFrame with proper column names.