Machine Learning Basics

What is Machine Learning?

Machine Learning (ML) is a subset of AI that enables systems to learn and improve from experience without being explicitly programmed. Instead of writing rules, we feed data to algorithms that learn patterns automatically.

Traditional Programming vs Machine Learning

Traditional Programming

Input: Data + Rules
Output: Answers

Example: If temperature > 30°C, display "Hot"

Machine Learning

Input: Data + Answers
Output: Rules (Model)

Example: Learn from thousands of temperature/label pairs

Key Components of ML

📊 Data

The raw information used for training

🎯 Features

Input variables or attributes

🏷️ Labels

The output or target variable

🧮 Algorithm

The learning method

📦 Model

The learned patterns

📈 Training

The learning process

Real-World Example: Email Spam Detection

Problem: Automatically identify spam emails

Data: Thousands of emails labeled as spam or not spam
Features: Words in email, sender info, subject line
Label: Spam or Not Spam
Training: Algorithm learns patterns from labeled emails
Model: Can now classify new emails automatically

Simple Example: Predicting House Prices

# Simple ML example with scikit-learn
from sklearn.linear_model import LinearRegression
import numpy as np

# Data: House sizes (sq ft) and prices ($1000s)
house_sizes = np.array([[600], [800], [1000], [1200], [1400]])
prices = np.array([150, 200, 250, 300, 350])

# Create and train the model
model = LinearRegression()
model.fit(house_sizes, prices)

# Predict price for a 1100 sq ft house
new_house = np.array([[1100]])
predicted_price = model.predict(new_house)

print(f"Predicted price: ${predicted_price[0]:.2f}k")
# Output: Predicted price: $275.00k

💡 This model learned the relationship between size and price from just 5 examples!

The ML Workflow

Collect Data: Gather relevant information
Clean Data: Remove errors and inconsistencies
Split Data: Training set and testing set
Choose Algorithm: Select appropriate ML method
Train Model: Feed training data to algorithm
Evaluate Model: Test on unseen data
Deploy Model: Use in real applications
Monitor & Update: Improve over time

Common ML Algorithms

Linear Regression: Predicting continuous values
Logistic Regression: Binary classification
Decision Trees: Rule-based decisions
Random Forests: Multiple decision trees combined
K-Nearest Neighbors: Classification based on similarity
Neural Networks: Complex pattern recognition