Choosing a Vector Database

About 10 minutes

People designing RAG infrastructure, and those undecided on which vector DB to choose

Familiarity with What Is RAG? and the basics of Embeddings & Vector Representations

A vector database is a database designed to store text or images as numerical vectors and to quickly search for “semantically similar vectors.” In RAG, documents are vectorized by an embedding model, stored in a vector DB, and retrieved when a user submits a query.

Why a Vector Database Is Needed

As explained in Embeddings, documents are converted to hundreds or thousands of dimensions. With 10,000 documents, there are 10,000 vectors. Computing the distance between a query vector and every one of those vectors on each search is impractical.

Vector DBs use ANN (Approximate Nearest Neighbor) search to find vectors that are “close enough” — not strictly the nearest, but sufficiently accurate — in milliseconds. This remains practical even for millions of vectors. Algorithms like HNSW (Hierarchical Navigable Small World) are widely used.

graph LR
    A["Documents (text)"] --> B["Embedding model"]
    B --> C["Vectors"]
    C --> D["Vector DB (ANN index)"]
    E["User query"] --> F["Embedding model"]
    F --> G["Query vector"]
    G --> D
    D --> H["Top K similar results"]
    H --> I["Pass to LLM"]

Types of Vector Databases

Vector DBs can be broadly categorized into four types based on how they are deployed.

1. Managed Cloud

The service provider manages the infrastructure. Setup is simple, and scaling and backups are automated.

Examples: Pinecone

2. Open-Source Standalone

Open-source vector DBs that can be self-hosted. Managed cloud options are also available.

Examples: Weaviate, Qdrant

3. Embedded / Local (In-Process)

Runs inside the application process or locally, with no separate server required. Suitable for prototypes and small-scale development.

Examples: Chroma, Faiss

4. SQL Extension

Adds vector search capability to an existing relational database.

Examples: pgvector (PostgreSQL extension)

Vector Database Comparison

Name	Type	Hosting	Hybrid Search	Metadata Filtering	Scaling	Free Tier
Chroma	Embedded	Local / self-host	Limited	Yes	Small–medium	Fully free (OSS)
Pinecone	Managed	Cloud (AWS/GCP)	Yes	Yes	Large scale	Yes (Starter)
Weaviate	OSS / Managed	Self / Cloud	Built-in	Yes	Large scale	Yes (cloud)
Qdrant	OSS / Managed	Self / Cloud	Yes	Yes	Large scale	Yes (cloud)
pgvector	SQL extension	PostgreSQL server	Manual implementation	PostgreSQL SQL	Medium–large	Free (with PostgreSQL)
Faiss	Library	In-process (library)	No	No	Offline / research	Fully free (OSS)

Individual Vector DB Profiles

Chroma — Best for Prototyping

Chroma is a simple vector DB that stores data in a local file or in memory. Its documentation describes it as AI data infrastructure that includes embeddings, metadata storage, vector search, and full-text search.[1]

# pip install chromadb
import chromadb

# Persistent local storage
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("documents")

# Add documents
collection.add(
    documents=["RAG stands for Retrieval-Augmented Generation", "Embeddings convert text to vectors"],
    metadatas=[{"source": "intro.md"}, {"source": "embeddings.md"}],
    ids=["doc1", "doc2"]
)

# Search
results = collection.query(
    query_texts=["Explain how RAG works"],
    n_results=2
)
print(results["documents"])

Best for: Prototype development, local experiments, thousands to tens of thousands of vectors

Pinecone — Managed Production

Pinecone is a managed vector database that provides semantic search, full-text search, hybrid search, metadata filtering, and reranking capabilities.[2]

# pip install pinecone
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="YOUR_API_KEY")  # Use PINECONE_API_KEY environment variable

# Create index (first time only)
pc.create_index(
    name="rag-documents",
    dimension=1536,  # Dimensions for text-embedding-3-small
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-documents")

# Add vectors
index.upsert(vectors=[
    {"id": "doc1", "values": embedding_vector, "metadata": {"source": "intro.md"}},
])

# Search
results = index.query(vector=query_vector, top_k=5, include_metadata=True)

Best for: Production environments, team development, reducing infrastructure management overhead

Weaviate — Strong Hybrid Search

Weaviate is an open-source vector DB that can handle BM25 + vector hybrid search.[3]

# pip install weaviate-client
import weaviate

client = weaviate.connect_to_local()  # Local Docker setup

collection = client.collections.get("Documents")

# Hybrid search (BM25 + vector)
results = collection.query.hybrid(
    query="How to read a file in Python",
    alpha=0.5,   # 0 = BM25 only, 1 = vector only
    limit=5
)

for obj in results.objects:
    print(obj.properties["content"][:100])

Best for: Hybrid search requirements, large-scale self-hosted operations, Docker-based deployments

Qdrant — Fast, Rust-Based

Qdrant is an open-source vector DB implemented in Rust. It provides search features that combine filtering with vector search.[4]

# pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(":memory:")  # In-memory mode (development)
# client = QdrantClient(url="http://localhost:6333")  # Local server

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding_vector,
            payload={"source": "intro.md", "content": "RAG is..."}
        )
    ]
)

results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    limit=5
)

Best for: High throughput requirements, preference for Rust-based reliability, self-hosted operations

pgvector — Best When Already Using PostgreSQL

pgvector is a PostgreSQL extension that adds vector search to an existing PostgreSQL database. It supports exact and approximate nearest-neighbor search plus distance functions such as L2 distance, inner product, and cosine distance.[5]

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with a vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    source VARCHAR(255),
    embedding vector(1536)  -- Dimensions for text-embedding-3-small
);

-- Create an HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

# pip install psycopg2-binary pgvector
from pgvector.psycopg2 import register_vector
import psycopg2

conn = psycopg2.connect("postgresql://user:password@localhost/dbname")
register_vector(conn)

cursor = conn.cursor()

# Vector search (top 5)
cursor.execute(
    "SELECT content, source, 1 - (embedding <=> %s::vector) AS similarity "
    "FROM documents ORDER BY embedding <=> %s::vector LIMIT 5",
    (query_vector, query_vector)
)
results = cursor.fetchall()

Best for: Leveraging existing PostgreSQL infrastructure, complex SQL-based filtering, centralizing data management

Faiss — Offline and Research Use

Faiss (Facebook AI Similarity Search), developed by Meta, is a library for similarity search and clustering of dense vectors.[6]

# pip install faiss-cpu
import faiss
import numpy as np

dimension = 1536
index = faiss.IndexFlatIP(dimension)  # Inner product based search

# Normalize before adding (for cosine similarity)
vectors = np.array([embedding_vector]).astype("float32")
faiss.normalize_L2(vectors)
index.add(vectors)

# Search
query = np.array([query_vector]).astype("float32")
faiss.normalize_L2(query)
distances, indices = index.search(query, k=5)

Best for: Offline environments, research and experimentation, cases with no metadata filtering requirements

Choosing by Use Case

Getting a Prototype Running Quickly

Chroma is the recommended starting point. No additional server setup is required, and integration with LangChain and LlamaIndex is straightforward.

Already Using PostgreSQL

pgvector is the most natural choice. It avoids introducing a new database and reuses existing backup, access control, and SQL query infrastructure.

Production Without Infrastructure Management

Pinecone’s managed service is a viable option. Setup requires only an API key, and scaling is automatic.

Hybrid Search (BM25 + Vector) Needed

Weaviate and Qdrant both offer hybrid search as a built-in feature. A combination of Elasticsearch with pgvector is also a strong option.

Large-Scale Self-Hosted Operations

Qdrant (fast, Rust-based) and Weaviate (feature-rich) are both strong candidates. Both support deployment via Docker/Kubernetes.

Operational Considerations

Updating and Deleting Vectors

When documents are updated or deleted, the corresponding vectors must be updated or deleted as well. Most vector DBs support upsert (update or insert) and delete.

# Delete in Chroma
collection.delete(ids=["doc1"])

# Update in Pinecone (upsert = update or insert)
index.upsert(vectors=[
    {"id": "doc1", "values": new_embedding_vector, "metadata": {"source": "intro_v2.md"}}
])

Backup and Re-indexing

Changing the embedding model or chunking strategy requires re-creating all vectors. In production, keep the original document text stored separately so it can be re-vectorized when needed.

Migration

When migrating to a different vector DB, export the original document text and metadata, then re-index in the new DB. The vectors themselves typically do not need to be exported — they are regenerated from the source text.

Summary

A vector DB enables fast search for semantically similar vectors
For prototypes, Chroma; for existing PostgreSQL, pgvector; for managed production, Pinecone are practical starting points
For hybrid search requirements, Weaviate or Qdrant are strong choices
Plan for document update, deletion, and re-indexing operations from the start

FAQ

Q: Can I use a regular database (MySQL, PostgreSQL) instead?

A: Regular databases are not optimized for high-dimensional vector nearest-neighbor search. As the number of documents grows, search speed degrades beyond practical limits. For PostgreSQL specifically, the pgvector extension enables fast vector search while staying within the PostgreSQL ecosystem.

Q: Can I migrate to a different vector DB later?

A: Yes. When switching vector DBs, having the original document text and metadata stored separately allows re-vectorization and re-indexing in the new DB. If using a framework like LangChain or LlamaIndex, swapping the vector DB component is relatively straightforward.

Q: Are there free vector DBs?

A: Chroma (OSS) and Faiss (OSS) are fully free. Pinecone, Weaviate, and Qdrant’s cloud offerings all have free tiers. pgvector is available at no additional cost if PostgreSQL is already in use. For minimal cost, Chroma + self-hosted or pgvector are the main options.

Q: Is data stored in vector DBs encrypted?

A: Managed services like Pinecone provide encryption in transit (TLS) and at rest. For self-hosted deployments, encryption must be configured at the infrastructure level. When handling confidential documents, select the hosting location according to the applicable security policy.

References

What is Fine-tuning?

Chunking Strategies