This page includes an interactive code editor. Try modifying and running the examples!
Pandas Data Filtering
Data filtering is one of the most common and essential operations in data analysis with Pandas. It allows you to extract specific subsets of data based on conditions, making it easier to focus on relevant information.
1. Basic Boolean Filtering
The most fundamental way to filter data in Pandas is using boolean conditions.
- Syntax:
df[condition] - Mechanism: Create a boolean mask (True/False values) and apply it to the DataFrame
- Example:
df[df['Age'] > 30]returns rows where Age is greater than 30
| Operator | Description | Example |
|---|---|---|
| > | Greater than | df['Salary'] > 50000 |
| < | Less than | df['Age'] < 40 |
| == | Equal to | df['Department'] == 'IT' |
| != | Not equal to | df['City'] != 'London' |
| >= | Greater than or equal | df['Rating'] >= 4.0 |
| <= | Less than or equal | df['Stock'] <= 50 |
2. Multiple Conditions Filtering
Combine multiple conditions using logical operators:
- & (AND): Both conditions must be True
- | (OR): At least one condition must be True
- ~ (NOT): Invert the condition
- Important: Always use parentheses around each condition
Note: Use
&, |, ~ instead of and, or, not for element-wise operations.3. Special Filtering Methods
Pandas provides several convenient methods for common filtering scenarios:
| Method | Description | Example |
|---|---|---|
| isin() | Check if value is in a list | df['Dept'].isin(['HR', 'IT']) |
| str.contains() | String pattern matching | df['Name'].str.contains('John') |
| between() | Check if value is between range | df['Age'].between(25, 35) |
| isna() / notna() | Filter missing values | df[df['Salary'].notna()] |
| query() | Filter using query string | df.query('Salary > 50000 and Age < 40') |
Example: Basic Data Filtering
Data Filtering Examples
Advanced Filtering Techniques
Advanced Filtering Examples
Filtering Best Practices
- Use parentheses for multiple conditions
- Chain filters for complex conditions
- Use
query()for readable complex filters - Consider performance for large datasets
- Always test filters on small subsets first
Common Use Cases
- Data cleaning (filtering outliers)
- Segment analysis (by category)
- Time-based filtering
- Quality control (filtering valid records)
- Performance analysis (top/bottom performers)
Pro Tip: When working with large datasets, consider using
.loc[] for better performance and memory efficiency compared to chained boolean indexing.