Pandas Data Manipulation
Data manipulation is at the core of data analysis with Pandas. This tutorial covers essential techniques for filtering, sorting, transforming, and aggregating data using Pandas DataFrames.
1. Filtering Data
Filtering allows you to select subsets of data based on conditions.
- Boolean Indexing: Use boolean conditions to filter rows
- Multiple Conditions: Combine conditions using
&(and),|(or),~(not) - Query Method: Use
df.query()for more complex filtering
| Operation | Syntax | Description |
|---|---|---|
| Single Condition | df[df['column'] > value] | Filter rows where column values meet condition |
| Multiple Conditions | df[(cond1) & (cond2)] | Filter using AND logic |
| String Contains | df[df['col'].str.contains('text')] | Filter rows containing specific text |
2. Sorting Data
Sorting helps in organizing data for better analysis and visualization.
- Ascending/Descending: Control sort order with
ascendingparameter - Multiple Columns: Sort by multiple columns with priority order
- In-place Sorting: Use
inplace=Trueto modify original DataFrame
3. Adding and Modifying Columns
Create new columns or modify existing ones based on calculations or transformations.
- Direct Assignment:
df['new_col'] = values - Apply Function: Use
apply()for element-wise operations - Vectorized Operations: Perform operations on entire columns
4. Grouping and Aggregation
Group data by categories and compute aggregate statistics.
- GroupBy:
df.groupby('column') - Aggregate Functions:
mean(),sum(),count(),min(),max() - Multiple Aggregations: Use
agg()for different functions per column
5. Handling Missing Data
Deal with NaN values in your dataset.
- Detection:
isna(),notna() - Filling:
fillna()with mean, median, or specific value - Dropping:
dropna()to remove rows/columns with missing values
6. String Operations
Perform text manipulation on string columns using vectorized string methods.
- Case Conversion:
str.upper(),str.lower() - Searching:
str.contains(),str.startswith() - Extraction:
str.extract()with regular expressions
Data Manipulation Workflow
Typical data manipulation follows this pattern:
Load Data → Clean Data → Filter/Sort → Transform → Aggregate → AnalyzeExample: Comprehensive Data Manipulation
Data Manipulation Example
Common Manipulation Methods
filter()- Filter columnssort_values()- Sort datagroupby()- Group datapivot_table()- Create pivot tablesmerge()- Join DataFrames
Key Concepts
- Boolean indexing
- Vectorized operations
- Method chaining
- In-place vs. copy operations
- Performance optimization