This page includes an interactive code editor. Try modifying and running the examples!

Pandas Data Filtering

Data filtering is one of the most common and essential operations in data analysis with Pandas. It allows you to extract specific subsets of data based on conditions, making it easier to focus on relevant information.

1. Basic Boolean Filtering

The most fundamental way to filter data in Pandas is using boolean conditions.

  • Syntax: df[condition]
  • Mechanism: Create a boolean mask (True/False values) and apply it to the DataFrame
  • Example: df[df['Age'] > 30] returns rows where Age is greater than 30
OperatorDescriptionExample
>Greater thandf['Salary'] > 50000
<Less thandf['Age'] < 40
==Equal todf['Department'] == 'IT'
!=Not equal todf['City'] != 'London'
>=Greater than or equaldf['Rating'] >= 4.0
<=Less than or equaldf['Stock'] <= 50

2. Multiple Conditions Filtering

Combine multiple conditions using logical operators:

  • & (AND): Both conditions must be True
  • | (OR): At least one condition must be True
  • ~ (NOT): Invert the condition
  • Important: Always use parentheses around each condition
Note: Use &, |, ~ instead of and, or, not for element-wise operations.

3. Special Filtering Methods

Pandas provides several convenient methods for common filtering scenarios:

MethodDescriptionExample
isin()Check if value is in a listdf['Dept'].isin(['HR', 'IT'])
str.contains()String pattern matchingdf['Name'].str.contains('John')
between()Check if value is between rangedf['Age'].between(25, 35)
isna() / notna()Filter missing valuesdf[df['Salary'].notna()]
query()Filter using query stringdf.query('Salary > 50000 and Age < 40')

Example: Basic Data Filtering

Data Filtering Examples

Advanced Filtering Techniques

Advanced Filtering Examples
Filtering Best Practices
  • Use parentheses for multiple conditions
  • Chain filters for complex conditions
  • Use query() for readable complex filters
  • Consider performance for large datasets
  • Always test filters on small subsets first
Common Use Cases
  • Data cleaning (filtering outliers)
  • Segment analysis (by category)
  • Time-based filtering
  • Quality control (filtering valid records)
  • Performance analysis (top/bottom performers)
Pro Tip: When working with large datasets, consider using .loc[] for better performance and memory efficiency compared to chained boolean indexing.