Data Science Intro
Python is the #1 language for Data Science. This is largely due to two critical libraries: NumPy and Pandas.
1. NumPy (Numerical Python)
NumPy provides support for large, multi-dimensional arrays and matrices.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr * 2) # [2, 4, 6, 8, 10] (Vectorized operation)
2. Pandas (Data Analysis)
Pandas introduces DataFrames, which are like super-powered Excel tables inside Python.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [85, 92, 78]
}
df = pd.DataFrame(data)
print(df.describe()) # Statistical summary
print(df[df['Score'] > 80]) # Filtering data
3. Data Science Workflow
- Collection: SQL, APIs, or CSV files.
- Cleaning: Handling missing values with Pandas.
- Exploration: Statistical analysis.
- Visualization: Using Matplotlib or Seaborn.
Note: Data Science is a massive field. This chapter is just the entry point to show you how Python handles large datasets efficiently.