What are Large Language Models?
Large Language Models (LLMs) are AI systems trained on massive amounts of text data that can understand and generate human-like text. They power tools like ChatGPT, Claude, and Google Gemini.
The "Large" in LLM
GPT-3: 175 billion parameters (connections)
GPT-4: Estimated 1.76 trillion parameters
Training data: Hundreds of billions of words from books, websites, code
For comparison, the human brain has ~86 billion neurons!
What Can LLMs Do?
💬 Conversation
Natural dialogue, context awareness, follow-up questions
✍️ Writing
Essays, emails, stories, scripts, marketing copy
💻 Coding
Write, debug, explain, and optimize code
📊 Analysis
Summarize, extract insights, answer questions
🌍 Translation
Translate between 100+ languages
🎭 Creativity
Brainstorm ideas, write poetry, create characters
How LLMs Work (Simplified)
# Simplified LLM prediction process
def generate_response(prompt):
"""
LLMs predict one token (word piece) at a time
"""
tokens = tokenize(prompt) # "Hello world" → ["Hello", " world"]
# Start with user's prompt
generated_tokens = tokens
# Generate 50 new tokens
for _ in range(50):
# Predict next token based on context
next_token = model.predict_next_token(generated_tokens)
generated_tokens.append(next_token)
# Stop if we generate end token
if next_token == "":
break
return detokenize(generated_tokens)
# Example
prompt = "The capital of France is"
response = generate_response(prompt)
print(response)
# Output: "The capital of France is Paris, a beautiful city
# known for the Eiffel Tower..."
Popular LLMs
OpenAI GPT Series
- GPT-3.5: Powers free ChatGPT
- GPT-4: Most capable, multimodal
- GPT-4 Turbo: Faster, cheaper
Best for: General tasks, coding, creative writing
Anthropic Claude
- Claude 3 Opus: Most powerful
- Claude 3 Sonnet: Balanced
- Claude 3 Haiku: Fast & cheap
Best for: Long documents, analysis, safety
Google Gemini
- Gemini Ultra: Top-tier
- Gemini Pro: Standard
- Gemini Nano: On-device
Best for: Google integration, multimodal
Open Source
- Llama 2/3: Meta's models
- Mistral: Efficient European model
- Falcon: Strong open model
Best for: Self-hosting, fine-tuning, privacy
Using LLMs in Python
1. OpenAI API
# Install: pip install openai
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
# Simple completion
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci"}
],
temperature=0.7, # Creativity (0-2)
max_tokens=500 # Response length limit
)
print(response.choices[0].message.content)
# Streaming response (like ChatGPT typing effect)
stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
2. Hugging Face Transformers
# Install: pip install transformers torch
from transformers import pipeline
# Load a pre-trained model
generator = pipeline('text-generation', model='gpt2')
# Generate text
result = generator(
"Once upon a time",
max_length=100,
num_return_sequences=1
)
print(result[0]['generated_text'])
# Different tasks
summarizer = pipeline("summarization")
translator = pipeline("translation_en_to_fr")
sentiment = pipeline("sentiment-analysis")
# Use them
summary = summarizer("Long article text here...", max_length=50)
french = translator("Hello, how are you?")
feeling = sentiment("I love this product!")
3. LangChain (Advanced)
# Install: pip install langchain openai
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
# Create chat model
llm = ChatOpenAI(model="gpt-4", temperature=0.7)
# Create prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a {profession}. Be helpful and professional."),
("user", "{question}")
])
# Create chain
chain = LLMChain(llm=llm, prompt=prompt)
# Use it
response = chain.run(
profession="Python expert",
question="How do I read a CSV file?"
)
print(response)
Key Concepts
Tokens
LLMs process text in chunks called tokens. ~1 token ≈ 4 characters ≈ 0.75 words
# Example tokenization
"Hello world!" → ["Hello", " world", "!"] # 3 tokens
"Artificial Intelligence" → ["Art", "ificial", " Int", "elligence"] # 4 tokens
# Pricing is per token
# GPT-4: ~$0.03 per 1K input tokens, $0.06 per 1K output tokens
Context Window
Maximum tokens the model can process at once (input + output)
- GPT-3.5: 4K or 16K tokens
- GPT-4: 8K, 32K, or 128K tokens
- Claude 3: 200K tokens (~150K words!)
Longer context = can handle bigger documents, maintain longer conversations
Temperature
Controls randomness/creativity (0-2)
- 0: Deterministic, same output each time (good for factual tasks)
- 0.7: Balanced creativity (default)
- 1.5+: Very creative, unpredictable (good for brainstorming)
Top-p (Nucleus Sampling)
Alternative to temperature. Considers tokens with cumulative probability up to p
- 0.1: Very focused, only most likely tokens
- 0.9: Balanced (common default)
- 1.0: Consider all tokens
Complete Example: Simple Chatbot
# simple_chatbot.py
from openai import OpenAI
import os
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def chat():
"""Simple chatbot with conversation history"""
messages = [
{"role": "system", "content": "You are a friendly AI assistant."}
]
print("Chatbot ready! (Type 'quit' to exit)")
print("-" * 50)
while True:
# Get user input
user_input = input("You: ")
if user_input.lower() == 'quit':
print("Bot: Goodbye!")
break
# Add user message to history
messages.append({"role": "user", "content": user_input})
# Get AI response
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
temperature=0.7,
max_tokens=500
)
# Extract and display response
bot_message = response.choices[0].message.content
print(f"Bot: {bot_message}\n")
# Add bot response to history (for context)
messages.append({"role": "assistant", "content": bot_message})
# Show token usage
print(f"(Tokens used: {response.usage.total_tokens})")
if __name__ == "__main__":
chat()
Best Practices
- Be Specific: Clear prompts get better results
- Provide Context: Give the model background information
- Use System Messages: Set the model's behavior/role
- Iterate: Refine prompts based on outputs
- Handle Errors: API calls can fail, implement retries
- Monitor Costs: Track token usage
- Cache Results: Don't regenerate identical responses
- Rate Limit: Respect API limits
LLM Limitations
⚠️ What LLMs Can't Do (Yet)
- No Real-Time Knowledge: Training data has a cutoff date
- Hallucinations: Can confidently state false information
- No True Understanding: Pattern matching, not conscious thought
- Math Struggles: Complex calculations often wrong
- Context Limits: Can't process infinite text
- Inconsistency: Same prompt can give different results
Solution: Use tools like RAG (Retrieval Augmented Generation), function calling, and verification
Next Steps
- Learn prompt engineering techniques
- Understand transformer architecture
- Build a ChatGPT clone project
- Explore fine-tuning for custom models
- Learn LangChain for complex applications