This page includes an interactive code editor. Try modifying and running the examples!
Pandas Statistical Functions
Statistical analysis is crucial for understanding data distributions, relationships, and patterns. Pandas provides comprehensive statistical functions that work seamlessly with NumPy and SciPy for advanced analysis.
1. Descriptive Statistics
Basic statistical summaries that describe the main features of datasets.
- Central Tendency: Mean, median, mode
- Dispersion: Variance, standard deviation, range
- Shape: Skewness, kurtosis
- Position: Quantiles, percentiles
| Function | Description | Usage |
|---|---|---|
describe() | Comprehensive statistical summary | df.describe() |
mean() | Arithmetic average | df['col'].mean() |
median() | Middle value | df['col'].median() |
std() | Standard deviation | df['col'].std() |
var() | Variance | df['col'].var() |
2. Correlation Analysis
Measure relationships between variables.
# Pearson correlation
df.corr(method='pearson')
# Spearman correlation
df.corr(method='spearman')
# Correlation with specific column
df.corrwith(df['target'])3. Group-wise Statistics
Calculate statistics for different groups in data.
- GroupBy: Split-apply-combine pattern
- Aggregation: Multiple statistics at once
- Transformation: Group-specific calculations
Statistical Measures Overview
📊 Central Tendency
- Mean: Average value
- Median: Middle value
- Mode: Most frequent value
- Geometric Mean: For growth rates
- Harmonic Mean: For rates and ratios
📈 Dispersion
- Variance: Average squared deviations
- Standard Deviation: Square root of variance
- Range: Difference between max and min
- IQR: Interquartile range
- MAD: Mean absolute deviation
📐 Distribution Shape
- Skewness: Asymmetry measure
- Kurtosis: Tail heaviness
- Moments: Statistical moments
- Quantiles: Data division points
🔗 Relationship Measures
- Correlation: Linear relationship
- Covariance: Joint variability
- R-squared: Goodness of fit
- P-value: Statistical significance
Hypothesis Testing Methods
T-Tests
- One-sample t-test
- Independent samples t-test
- Paired t-test
- Compare group means
ANOVA
- One-way ANOVA
- Two-way ANOVA
- Compare multiple groups
- F-statistic analysis
Normality Tests
- Shapiro-Wilk test
- Kolmogorov-Smirnov test
- D'Agostino's test
- QQ-plot analysis
Example: Comprehensive Statistical Analysis
Statistical Functions Examples
Key Statistical Functions
describe()- Summary statisticscorr()- Correlation matrixcov()- Covariance matrixquantile()- Quantile valuesgroupby().agg()- Group statistics
Advanced Analysis
- Rolling statistics
- Expanding windows
- Hypothesis testing
- Confidence intervals
- Outlier detection
Statistical Best Practices:
- Always check for missing values before analysis
- Understand the data distribution before choosing tests
- Consider the assumptions of statistical tests
- Use appropriate correlation methods for data types
- Interpret p-values in context of effect size
Integration Tip: Pandas works seamlessly with SciPy and StatsModels for advanced statistical analysis. Use
scipy.stats for hypothesis testing and statsmodels for regression analysis.Common Statistical Patterns
# Descriptive statistics pattern
summary = df.describe(include='all')
# Correlation analysis pattern
correlation_matrix = df.select_dtypes(include=[np.number]).corr()
# Group-wise analysis pattern
group_stats = df.groupby('category').agg({
'value': ['mean', 'std', 'count'],
'score': ['min', 'max', 'median']
})
# Hypothesis testing pattern
from scipy.stats import ttest_ind
group1 = df[df['group'] == 'A']['value']
group2 = df[df['group'] == 'B']['value']
t_stat, p_value = ttest_ind(group1, group2)