Data Science Intro

Python is the #1 language for Data Science. This is largely due to two critical libraries: NumPy and Pandas.

1. NumPy (Numerical Python)

NumPy provides support for large, multi-dimensional arrays and matrices.

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr * 2) # [2, 4, 6, 8, 10] (Vectorized operation)

2. Pandas (Data Analysis)

Pandas introduces DataFrames, which are like super-powered Excel tables inside Python.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}
df = pd.DataFrame(data)

print(df.describe()) # Statistical summary
print(df[df['Score'] > 80]) # Filtering data

3. Data Science Workflow

  1. Collection: SQL, APIs, or CSV files.
  2. Cleaning: Handling missing values with Pandas.
  3. Exploration: Statistical analysis.
  4. Visualization: Using Matplotlib or Seaborn.
Note: Data Science is a massive field. This chapter is just the entry point to show you how Python handles large datasets efficiently.