What is Machine Learning?
Machine Learning (ML) is a subset of AI that enables systems to learn and improve from experience without being explicitly programmed. Instead of writing rules, we feed data to algorithms that learn patterns automatically.
Traditional Programming vs Machine Learning
Traditional Programming
Input: Data + Rules
Output: Answers
Example: If temperature > 30°C, display "Hot"
Machine Learning
Input: Data + Answers
Output: Rules (Model)
Example: Learn from thousands of temperature/label pairs
Key Components of ML
📊 Data
The raw information used for training
🎯 Features
Input variables or attributes
🏷️ Labels
The output or target variable
🧮 Algorithm
The learning method
📦 Model
The learned patterns
📈 Training
The learning process
Real-World Example: Email Spam Detection
Problem: Automatically identify spam emails
- Data: Thousands of emails labeled as spam or not spam
- Features: Words in email, sender info, subject line
- Label: Spam or Not Spam
- Training: Algorithm learns patterns from labeled emails
- Model: Can now classify new emails automatically
Simple Example: Predicting House Prices
# Simple ML example with scikit-learn
from sklearn.linear_model import LinearRegression
import numpy as np
# Data: House sizes (sq ft) and prices ($1000s)
house_sizes = np.array([[600], [800], [1000], [1200], [1400]])
prices = np.array([150, 200, 250, 300, 350])
# Create and train the model
model = LinearRegression()
model.fit(house_sizes, prices)
# Predict price for a 1100 sq ft house
new_house = np.array([[1100]])
predicted_price = model.predict(new_house)
print(f"Predicted price: ${predicted_price[0]:.2f}k")
# Output: Predicted price: $275.00k
💡 This model learned the relationship between size and price from just 5 examples!
The ML Workflow
- Collect Data: Gather relevant information
- Clean Data: Remove errors and inconsistencies
- Split Data: Training set and testing set
- Choose Algorithm: Select appropriate ML method
- Train Model: Feed training data to algorithm
- Evaluate Model: Test on unseen data
- Deploy Model: Use in real applications
- Monitor & Update: Improve over time
Common ML Algorithms
- Linear Regression: Predicting continuous values
- Logistic Regression: Binary classification
- Decision Trees: Rule-based decisions
- Random Forests: Multiple decision trees combined
- K-Nearest Neighbors: Classification based on similarity
- Neural Networks: Complex pattern recognition