📘 Midterm Review — (Chapters 1–4)
Chapter 1: The Machine Learning Landscape
Important Concepts
- Types of ML:
- Supervised (e.g., spam detection)
- Unsupervised (e.g., clustering)
- Reinforcement (e.g., training a robot)
- Supervised vs Unsupervised:
- Supervised: Training data has labels.
- Unsupervised: No labels, only input features.
- Reinforcement Learning:
- Learn by interacting with environment and getting rewards.
- Batch vs Online Learning:
- Batch: Train once on full data.
- Online: Train continuously with incoming data.
- Concept Drift:
- When the data distribution changes over time.
- Instance-based vs Model-based Learning:
- Instance-based: Memorizes examples.
- Model-based: Learns a general model.
- Regression vs Classification:
- Regression predicts numbers (e.g., house prices).
- Classification predicts categories (e.g., cat vs dog).
- Challenges in ML:
- Insufficient data
- Poor quality data
- Overfitting / Underfitting
Chapter 2: End-to-End Machine Learning Project
Key Topics
- Define objective early:
- To align solution with business goals.
- ML Project Steps:
- Frame the problem → Get data → Explore → Prepare → Model → Evaluate → Deploy.
- Importance of Test Set Early:
- Stratified Sampling:
- Keep same proportions of important classes.
- Data Exploration Checklist:
- Name
- Type
- Missing values %
- Noisiness
- Usefulness
- Handle Missing Values:
- Drop rows
- Drop columns
- Impute (fill)
- Feature Engineering:
- Create new features to improve performance.