This page includes an interactive code editor. Try modifying and running the examples!

Pandas Data Structures

The two primary data structures in Pandas are the Series and the DataFrame. These are the fundamental building blocks for efficient data manipulation and analysis in Python.

1. The Pandas Series

The Series is the most basic Pandas data structure.

  • Definition: A one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.).
  • Analogy: You can think of a Series as a single column in a spreadsheet or a SQL table.
  • Key Feature: Index: A Series has an index (row labels) associated with its data, which allows for fast lookups and automatic data alignment during operations.
FeatureDescription
Dimensionality1-Dimensional
MutabilityData is mutable (can be changed), but the size is immutable (cannot easily add/remove elements).
Data TypeHomogeneous (all elements typically hold the same data type).

2. The Pandas DataFrame

The DataFrame is the most commonly used and powerful Pandas data structure.

  • Definition: A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
  • Analogy: A DataFrame is essentially a spreadsheet or a collection of Series objects where each Series is a column and they all share the same index (the row labels).
  • Structure: It has both a **row index** and distinct **column labels**.
FeatureDescription
Dimensionality2-Dimensional
MutabilityBoth data and size are mutable (you can add and remove columns).
Data TypeHeterogeneous (each column/Series can hold a different data type).
Core ComponentsComposed of: **Data** (the values), **Index** (row labels), and **Columns** (column labels).

Data Structure Hierarchy

The relationship between the two structures can be visualized as:

DataFrame  →  Collection of aligned SeriesSeries     →  Collection of labeled Scalar Values

Example: Creating Series and DataFrames

Data Structures Example
Series Features
  • One-dimensional
  • Homogeneous data
  • Size immutable
  • Values mutable
DataFrame Features
  • Two-dimensional
  • Potentially heterogeneous data
  • Size mutable
  • Data mutable