This page includes an interactive code editor. Try modifying and running the examples!

Pandas Statistical Functions

Statistical analysis is crucial for understanding data distributions, relationships, and patterns. Pandas provides comprehensive statistical functions that work seamlessly with NumPy and SciPy for advanced analysis.

1. Descriptive Statistics

Basic statistical summaries that describe the main features of datasets.

  • Central Tendency: Mean, median, mode
  • Dispersion: Variance, standard deviation, range
  • Shape: Skewness, kurtosis
  • Position: Quantiles, percentiles
FunctionDescriptionUsage
describe()Comprehensive statistical summarydf.describe()
mean()Arithmetic averagedf['col'].mean()
median()Middle valuedf['col'].median()
std()Standard deviationdf['col'].std()
var()Variancedf['col'].var()

2. Correlation Analysis

Measure relationships between variables.

# Pearson correlation
df.corr(method='pearson')

# Spearman correlation
df.corr(method='spearman')

# Correlation with specific column
df.corrwith(df['target'])

3. Group-wise Statistics

Calculate statistics for different groups in data.

  • GroupBy: Split-apply-combine pattern
  • Aggregation: Multiple statistics at once
  • Transformation: Group-specific calculations

Statistical Measures Overview

📊 Central Tendency
  • Mean: Average value
  • Median: Middle value
  • Mode: Most frequent value
  • Geometric Mean: For growth rates
  • Harmonic Mean: For rates and ratios
📈 Dispersion
  • Variance: Average squared deviations
  • Standard Deviation: Square root of variance
  • Range: Difference between max and min
  • IQR: Interquartile range
  • MAD: Mean absolute deviation
📐 Distribution Shape
  • Skewness: Asymmetry measure
  • Kurtosis: Tail heaviness
  • Moments: Statistical moments
  • Quantiles: Data division points
🔗 Relationship Measures
  • Correlation: Linear relationship
  • Covariance: Joint variability
  • R-squared: Goodness of fit
  • P-value: Statistical significance

Hypothesis Testing Methods

T-Tests
  • One-sample t-test
  • Independent samples t-test
  • Paired t-test
  • Compare group means
ANOVA
  • One-way ANOVA
  • Two-way ANOVA
  • Compare multiple groups
  • F-statistic analysis
Normality Tests
  • Shapiro-Wilk test
  • Kolmogorov-Smirnov test
  • D'Agostino's test
  • QQ-plot analysis

Example: Comprehensive Statistical Analysis

Statistical Functions Examples
Key Statistical Functions
  • describe() - Summary statistics
  • corr() - Correlation matrix
  • cov() - Covariance matrix
  • quantile() - Quantile values
  • groupby().agg() - Group statistics
Advanced Analysis
  • Rolling statistics
  • Expanding windows
  • Hypothesis testing
  • Confidence intervals
  • Outlier detection
Statistical Best Practices:
  • Always check for missing values before analysis
  • Understand the data distribution before choosing tests
  • Consider the assumptions of statistical tests
  • Use appropriate correlation methods for data types
  • Interpret p-values in context of effect size
Integration Tip: Pandas works seamlessly with SciPy and StatsModels for advanced statistical analysis. Use scipy.stats for hypothesis testing and statsmodels for regression analysis.

Common Statistical Patterns

# Descriptive statistics pattern
summary = df.describe(include='all')

# Correlation analysis pattern
correlation_matrix = df.select_dtypes(include=[np.number]).corr()

# Group-wise analysis pattern
group_stats = df.groupby('category').agg({
    'value': ['mean', 'std', 'count'],
    'score': ['min', 'max', 'median']
})

# Hypothesis testing pattern
from scipy.stats import ttest_ind
group1 = df[df['group'] == 'A']['value']
group2 = df[df['group'] == 'B']['value']
t_stat, p_value = ttest_ind(group1, group2)