Introduction to Large Language Models

What are Large Language Models?

Large Language Models (LLMs) are AI systems trained on massive amounts of text data that can understand and generate human-like text. They power tools like ChatGPT, Claude, and Google Gemini.

The "Large" in LLM

GPT-3: 175 billion parameters (connections)
GPT-4: Estimated 1.76 trillion parameters
Training data: Hundreds of billions of words from books, websites, code

For comparison, the human brain has ~86 billion neurons!

What Can LLMs Do?

💬 Conversation

Natural dialogue, context awareness, follow-up questions

✍️ Writing

Essays, emails, stories, scripts, marketing copy

💻 Coding

Write, debug, explain, and optimize code

📊 Analysis

Summarize, extract insights, answer questions

🌍 Translation

Translate between 100+ languages

🎭 Creativity

Brainstorm ideas, write poetry, create characters

How LLMs Work (Simplified)

Training: Model reads billions of text examples

Learning Patterns: Discovers language patterns, facts, reasoning

Tokenization: Breaks text into pieces (tokens)

Prediction: Predicts next most likely token

Generation: Repeats to create full responses

# Simplified LLM prediction process
def generate_response(prompt):
    """
    LLMs predict one token (word piece) at a time
    """
    tokens = tokenize(prompt)  # "Hello world" → ["Hello", " world"]
    
    # Start with user's prompt
    generated_tokens = tokens
    
    # Generate 50 new tokens
    for _ in range(50):
        # Predict next token based on context
        next_token = model.predict_next_token(generated_tokens)
        generated_tokens.append(next_token)
        
        # Stop if we generate end token
        if next_token == "":
            break
    
    return detokenize(generated_tokens)

# Example
prompt = "The capital of France is"
response = generate_response(prompt)
print(response)
# Output: "The capital of France is Paris, a beautiful city 
#          known for the Eiffel Tower..."

Popular LLMs

OpenAI GPT Series

GPT-3.5: Powers free ChatGPT
GPT-4: Most capable, multimodal
GPT-4 Turbo: Faster, cheaper

Best for: General tasks, coding, creative writing

Anthropic Claude

Claude 3 Opus: Most powerful
Claude 3 Sonnet: Balanced
Claude 3 Haiku: Fast & cheap

Best for: Long documents, analysis, safety

Google Gemini

Gemini Ultra: Top-tier
Gemini Pro: Standard
Gemini Nano: On-device

Best for: Google integration, multimodal

Open Source

Llama 2/3: Meta's models
Mistral: Efficient European model
Falcon: Strong open model

Best for: Self-hosting, fine-tuning, privacy

Using LLMs in Python

1. OpenAI API

# Install: pip install openai
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

# Simple completion
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to calculate fibonacci"}
    ],
    temperature=0.7,  # Creativity (0-2)
    max_tokens=500    # Response length limit
)

print(response.choices[0].message.content)

# Streaming response (like ChatGPT typing effect)
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

2. Hugging Face Transformers

# Install: pip install transformers torch
from transformers import pipeline

# Load a pre-trained model
generator = pipeline('text-generation', model='gpt2')

# Generate text
result = generator(
    "Once upon a time",
    max_length=100,
    num_return_sequences=1
)

print(result[0]['generated_text'])

# Different tasks
summarizer = pipeline("summarization")
translator = pipeline("translation_en_to_fr")
sentiment = pipeline("sentiment-analysis")

# Use them
summary = summarizer("Long article text here...", max_length=50)
french = translator("Hello, how are you?")
feeling = sentiment("I love this product!")

3. LangChain (Advanced)

# Install: pip install langchain openai
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

# Create chat model
llm = ChatOpenAI(model="gpt-4", temperature=0.7)

# Create prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {profession}. Be helpful and professional."),
    ("user", "{question}")
])

# Create chain
chain = LLMChain(llm=llm, prompt=prompt)

# Use it
response = chain.run(
    profession="Python expert",
    question="How do I read a CSV file?"
)

print(response)

Key Concepts

Tokens

LLMs process text in chunks called tokens. ~1 token ≈ 4 characters ≈ 0.75 words

# Example tokenization
"Hello world!" → ["Hello", " world", "!"]  # 3 tokens
"Artificial Intelligence" → ["Art", "ificial", " Int", "elligence"]  # 4 tokens

# Pricing is per token
# GPT-4: ~$0.03 per 1K input tokens, $0.06 per 1K output tokens

Context Window

Maximum tokens the model can process at once (input + output)

GPT-3.5: 4K or 16K tokens
GPT-4: 8K, 32K, or 128K tokens
Claude 3: 200K tokens (~150K words!)

Longer context = can handle bigger documents, maintain longer conversations

Temperature

Controls randomness/creativity (0-2)

0: Deterministic, same output each time (good for factual tasks)
0.7: Balanced creativity (default)
1.5+: Very creative, unpredictable (good for brainstorming)

Top-p (Nucleus Sampling)

Alternative to temperature. Considers tokens with cumulative probability up to p

0.1: Very focused, only most likely tokens
0.9: Balanced (common default)
1.0: Consider all tokens

Complete Example: Simple Chatbot

# simple_chatbot.py
from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def chat():
    """Simple chatbot with conversation history"""
    messages = [
        {"role": "system", "content": "You are a friendly AI assistant."}
    ]
    
    print("Chatbot ready! (Type 'quit' to exit)")
    print("-" * 50)
    
    while True:
        # Get user input
        user_input = input("You: ")
        
        if user_input.lower() == 'quit':
            print("Bot: Goodbye!")
            break
        
        # Add user message to history
        messages.append({"role": "user", "content": user_input})
        
        # Get AI response
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0.7,
            max_tokens=500
        )
        
        # Extract and display response
        bot_message = response.choices[0].message.content
        print(f"Bot: {bot_message}\n")
        
        # Add bot response to history (for context)
        messages.append({"role": "assistant", "content": bot_message})
        
        # Show token usage
        print(f"(Tokens used: {response.usage.total_tokens})")

if __name__ == "__main__":
    chat()

Best Practices

Be Specific: Clear prompts get better results
Provide Context: Give the model background information
Use System Messages: Set the model's behavior/role
Iterate: Refine prompts based on outputs
Handle Errors: API calls can fail, implement retries
Monitor Costs: Track token usage
Cache Results: Don't regenerate identical responses
Rate Limit: Respect API limits

LLM Limitations

⚠️ What LLMs Can't Do (Yet)

No Real-Time Knowledge: Training data has a cutoff date
Hallucinations: Can confidently state false information
No True Understanding: Pattern matching, not conscious thought
Math Struggles: Complex calculations often wrong
Context Limits: Can't process infinite text
Inconsistency: Same prompt can give different results

Solution: Use tools like RAG (Retrieval Augmented Generation), function calling, and verification

Next Steps

Learn prompt engineering techniques
Understand transformer architecture
Build a ChatGPT clone project
Explore fine-tuning for custom models
Learn LangChain for complex applications