🗄️ Vector Databases for AI

Semantic search at scale

What are Vector Databases?

Vector databases store and search embeddings (high-dimensional vectors). Unlike traditional databases that search for exact matches, vector DBs find semantically similar content.

Traditional Database Vector Database
Exact keyword matching Semantic similarity
"dog" ≠ "puppy" "dog" ≈ "puppy" (similar meaning)
SQL queries Vector similarity (cosine, euclidean)
Fast for structured data Fast for unstructured data (text, images)

🎯 Use Cases for RAG & AI

📚 Document Q&A

Store document embeddings, retrieve relevant chunks for LLM context

🔍 Semantic Search

Find similar products, articles, or content by meaning

💬 Chatbot Memory

Store conversation history, retrieve relevant context

🎨 Image Search

Search images by description or visual similarity

🛒 Recommendations

Find similar items based on embeddings

🔬 Anomaly Detection

Find outliers by distance in vector space

🏆 Popular Vector Databases

Pinecone (Managed, Cloud-native)

import pinecone
from sentence_transformers import SentenceTransformer

# Initialize
pinecone.init(api_key="your-key", environment="us-west1-gcp")

# Create index
pinecone.create_index("my-index", dimension=384, metric="cosine")
index = pinecone.Index("my-index")

# Embed and insert
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["AI is amazing", "Machine learning rocks", "Pizza is delicious"]
embeddings = model.encode(texts)

# Upsert vectors
vectors = [
    (f"id-{i}", embedding.tolist(), {"text": text})
    for i, (embedding, text) in enumerate(zip(embeddings, texts))
]
index.upsert(vectors=vectors)

# Query
query_embedding = model.encode(["What is AI?"])[0]
results = index.query(vector=query_embedding.tolist(), top_k=2, include_metadata=True)

for match in results["matches"]:
    print(f"Score: {match['score']:.4f} - {match['metadata']['text']}")
  • ✅ Fully managed, no infrastructure
  • ✅ Auto-scaling, high performance
  • ✅ Free tier: 1 index, 100K vectors
  • ⚠️ Paid for production use

Chroma (Open-source, Local/Cloud)

import chromadb
from chromadb.utils import embedding_functions

# Initialize
client = chromadb.Client()  # In-memory
# Or persistent:
# client = chromadb.PersistentClient(path="./chroma_db")

# Create collection
collection = client.create_collection(
    name="my_collection",
    embedding_function=embedding_functions.SentenceTransformerEmbeddingFunction(
        model_name="all-MiniLM-L6-v2"
    )
)

# Add documents (embeddings created automatically)
collection.add(
    documents=["AI is amazing", "Machine learning rocks", "Pizza is delicious"],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}, {"source": "doc3"}],
    ids=["id1", "id2", "id3"]
)

# Query
results = collection.query(
    query_texts=["What is AI?"],
    n_results=2
)

print(results["documents"])  # Most similar documents
print(results["distances"])  # Similarity scores
  • ✅ Open-source, free
  • ✅ Easy to get started
  • ✅ Great for prototyping
  • ⚠️ Basic scalability

Weaviate (Open-source, GraphQL)

import weaviate

# Connect
client = weaviate.Client("http://localhost:8080")

# Create schema
schema = {
    "classes": [{
        "class": "Document",
        "vectorizer": "text2vec-openai",
        "properties": [
            {"name": "content", "dataType": ["text"]},
            {"name": "source", "dataType": ["string"]}
        ]
    }]
}
client.schema.create(schema)

# Insert data
client.data_object.create(
    data_object={"content": "AI is amazing", "source": "doc1"},
    class_name="Document"
)

# Query with GraphQL
result = client.query.get("Document", ["content", "source"]) \
    .with_near_text({"concepts": ["artificial intelligence"]}) \
    .with_limit(2) \
    .do()

print(result)
  • ✅ Open-source with managed cloud option
  • ✅ GraphQL API
  • ✅ Built-in vectorization
  • ✅ Hybrid search (vector + keyword)

Qdrant (Open-source, Rust-based)

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Initialize
client = QdrantClient(":memory:")  # In-memory
# Or: client = QdrantClient(path="./qdrant_db")

# Create collection
client.create_collection(
    collection_name="my_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

# Insert vectors
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

texts = ["AI is amazing", "Machine learning rocks"]
embeddings = model.encode(texts)

points = [
    PointStruct(
        id=i,
        vector=embedding.tolist(),
        payload={"text": text}
    )
    for i, (embedding, text) in enumerate(zip(embeddings, texts))
]

client.upsert(collection_name="my_collection", points=points)

# Search
query_embedding = model.encode(["What is AI?"])[0]
results = client.search(
    collection_name="my_collection",
    query_vector=query_embedding.tolist(),
    limit=2
)

for result in results:
    print(f"Score: {result.score:.4f} - {result.payload['text']}")
  • ✅ High performance (Rust)
  • ✅ Rich filtering capabilities
  • ✅ Open-source with cloud option
  • ✅ Easy Docker deployment

🔧 Choosing the Right Vector DB

Database Best For Pricing Deployment
Pinecone Production, no devops Free tier → $70/mo Managed only
Chroma Prototyping, RAG apps Free (open-source) Local, self-hosted
Weaviate Hybrid search, GraphQL Free → Enterprise Self-hosted or cloud
Qdrant High performance, filtering Free → Enterprise Self-hosted or cloud
Milvus Large scale (billions) Free (open-source) Self-hosted, complex

🚀 Integration with LangChain

from langchain.vectorstores import Chroma, Pinecone, Weaviate
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader

# Load documents
loader = TextLoader("document.txt")
documents = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

# Create embeddings
embeddings = OpenAIEmbeddings()

# Store in vector DB (choose one)
# Chroma
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")

# Pinecone
# import pinecone
# pinecone.init(api_key="...", environment="...")
# vectorstore = Pinecone.from_documents(chunks, embeddings, index_name="my-index")

# Weaviate
# vectorstore = Weaviate.from_documents(chunks, embeddings, weaviate_url="http://localhost:8080")

# Query
results = vectorstore.similarity_search("What is the main topic?", k=3)
for doc in results:
    print(doc.page_content)

⚡ Performance Optimization

1. Choose the Right Embedding Model

# Small & Fast (384 dimensions)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')  # 80MB

# Better Quality (768 dimensions)
model = SentenceTransformer('all-mpnet-base-v2')  # 420MB

# Best Quality (1024 dimensions)
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")  # API call

2. Indexing Strategy

3. Metadata Filtering

# Filter before similarity search (faster!)
results = vectorstore.similarity_search(
    "AI safety",
    k=5,
    filter={"source": "research_papers", "year": {"$gte": 2023}}
)

4. Batch Operations

# Insert in batches for better performance
batch_size = 100
for i in range(0, len(documents), batch_size):
    batch = documents[i:i+batch_size]
    vectorstore.add_documents(batch)

🎯 Key Takeaways