📘 Midterm Review — (Chapters 1–4)

Chapter 1: The Machine Learning Landscape

Types of ML:
- Supervised (e.g., spam detection)
- Unsupervised (e.g., clustering)
- Reinforcement (e.g., training a robot)
Supervised vs Unsupervised:
- Supervised: Training data has labels.
- Unsupervised: No labels, only input features.
Reinforcement Learning:
- Learn by interacting with environment and getting rewards.
Batch vs Online Learning:
- Batch: Train once on full data.
- Online: Train continuously with incoming data.
Concept Drift:
- When the data distribution changes over time.
Instance-based vs Model-based Learning:
- Instance-based: Memorizes examples.
- Model-based: Learns a general model.
Regression vs Classification:
- Regression predicts numbers (e.g., house prices).
- Classification predicts categories (e.g., cat vs dog).
Challenges in ML:
- Insufficient data
- Poor quality data
- Overfitting / Underfitting

Define objective early:
- To align solution with business goals.
ML Project Steps:
- Frame the problem → Get data → Explore → Prepare → Model → Evaluate → Deploy.
Importance of Test Set Early:
- Prevents data leakage.
Stratified Sampling:
- Keep same proportions of important classes.
Data Exploration Checklist:
- Name
- Type
- Missing values %
- Noisiness
- Usefulness
Handle Missing Values:
- Drop rows
- Drop columns
- Impute (fill)
Feature Engineering:
- Create new features to improve performance.