This page includes an interactive code editor. Try modifying and running the examples!

Pandas Data Manipulation

Data manipulation is at the core of data analysis with Pandas. This tutorial covers essential techniques for filtering, sorting, transforming, and aggregating data using Pandas DataFrames.

1. Filtering Data

Filtering allows you to select subsets of data based on conditions.

  • Boolean Indexing: Use boolean conditions to filter rows
  • Multiple Conditions: Combine conditions using & (and), | (or), ~ (not)
  • Query Method: Use df.query() for more complex filtering
OperationSyntaxDescription
Single Conditiondf[df['column'] > value]Filter rows where column values meet condition
Multiple Conditionsdf[(cond1) & (cond2)]Filter using AND logic
String Containsdf[df['col'].str.contains('text')]Filter rows containing specific text

2. Sorting Data

Sorting helps in organizing data for better analysis and visualization.

  • Ascending/Descending: Control sort order with ascending parameter
  • Multiple Columns: Sort by multiple columns with priority order
  • In-place Sorting: Use inplace=True to modify original DataFrame

3. Adding and Modifying Columns

Create new columns or modify existing ones based on calculations or transformations.

  • Direct Assignment: df['new_col'] = values
  • Apply Function: Use apply() for element-wise operations
  • Vectorized Operations: Perform operations on entire columns

4. Grouping and Aggregation

Group data by categories and compute aggregate statistics.

  • GroupBy: df.groupby('column')
  • Aggregate Functions: mean(), sum(), count(), min(), max()
  • Multiple Aggregations: Use agg() for different functions per column

5. Handling Missing Data

Deal with NaN values in your dataset.

  • Detection: isna(), notna()
  • Filling: fillna() with mean, median, or specific value
  • Dropping: dropna() to remove rows/columns with missing values

6. String Operations

Perform text manipulation on string columns using vectorized string methods.

  • Case Conversion: str.upper(), str.lower()
  • Searching: str.contains(), str.startswith()
  • Extraction: str.extract() with regular expressions

Data Manipulation Workflow

Typical data manipulation follows this pattern:

Load Data → Clean Data → Filter/Sort → Transform → Aggregate → Analyze

Example: Comprehensive Data Manipulation

Data Manipulation Example
Common Manipulation Methods
  • filter() - Filter columns
  • sort_values() - Sort data
  • groupby() - Group data
  • pivot_table() - Create pivot tables
  • merge() - Join DataFrames
Key Concepts
  • Boolean indexing
  • Vectorized operations
  • Method chaining
  • In-place vs. copy operations
  • Performance optimization
Tip: Always work with copies of your data when experimenting with manipulations to avoid modifying the original dataset unintentionally.