Vector Database Complete Comparison - Chroma, Milvus, Pinecone, Qdrant, Weaviate, pgvector
A comprehensive comparison of major vector databases covering architecture, installation, performance, indexing, hybrid search, scalability, and selection guide for RAG and AI applications.
Vector databases have become essential infrastructure for modern AI applications. From RAG pipelines to recommendation systems and image search, any application that works with embeddings needs a reliable way to store and retrieve vectors at scale. This post provides a comprehensive, hands-on comparison of six major vector databases to help you make the right choice.
1. Vector Database Overview
What Is a Vector Database?
A vector database is a specialized storage system designed to index, store, and query high-dimensional vector data (embeddings). Unlike traditional databases that operate on exact matches or range queries over scalar values, vector databases find the most similar items based on distance metrics in high-dimensional space.
[Traditional Database]
Query: SELECT * FROM products WHERE category = 'shoes' AND price < 100
Result: Exact matches based on structured fields
[Vector Database]
Query: Find the 10 vectors most similar to this embedding [0.12, -0.45, 0.78, ...]
Result: Semantically similar items ranked by distance
Why Vector Databases Matter for RAG and AI
In a RAG (Retrieval-Augmented Generation) pipeline, vector databases serve as the knowledge retrieval layer:
- Document Ingestion -- Text is split into chunks and converted to embeddings via models like OpenAI
text-embedding-3-smallorsentence-transformers - Storage -- Embeddings are stored alongside metadata in the vector database
- Retrieval -- At query time, the user question is embedded and the database returns the most relevant chunks
- Generation -- Retrieved chunks are passed as context to the LLM for answer generation
User Query
│
▼
[Embedding Model] ──→ Query Vector
│
▼
[Vector Database] ──→ Top-K Similar Documents
│
▼
[LLM + Context] ──→ Grounded Response
Key Concepts
Embedding: A fixed-length numerical vector (e.g., 768 or 1536 dimensions) that captures the semantic meaning of text, images, or other data. Similar content produces vectors that are close together in vector space.
Similarity Search: Finding the nearest neighbors to a query vector. Common distance metrics include:
| Metric | Formula | Best For |
|---|---|---|
| Cosine Similarity | cos(A, B) = A . B / (||A|| * ||B||) | Text similarity, normalized embeddings |
| Euclidean (L2) | sqrt(sum((a_i - b_i)^2)) | Image features, spatial data |
| Inner Product (IP) | sum(a_i * b_i) | Maximum inner product search, recommendation |
Approximate Nearest Neighbor (ANN): Exact nearest neighbor search is computationally prohibitive at scale (O(n) per query). ANN algorithms like HNSW and IVF trade a small amount of accuracy for dramatically faster retrieval, often achieving 95-99% recall at 100x+ speedup.
2. Architecture Comparison
Chroma -- Embedded and Lightweight
┌─────────────────────────────┐
│ Application │
│ ┌───────────────────────┐ │
│ │ Chroma Client │ │
│ │ (Python / JS SDK) │ │
│ └──────────┬────────────┘ │
│ │ │
│ ┌──────────▼────────────┐ │
│ │ Chroma Core Engine │ │
│ │ ┌────────┐ ┌───────┐ │ │
│ │ │ HNSW │ │SQLite │ │ │
│ │ │ Index │ │Meta │ │ │
│ │ └────────┘ └───────┘ │ │
│ └───────────────────────┘ │
└─────────────────────────────┘
- Core Technology: Python-based, uses HNSW (hnswlib) for indexing, SQLite/DuckDB for metadata
- Deployment Model: Embedded (in-process), client/server, or Docker
- Strengths: Zero-config setup, ideal for prototyping, runs in Jupyter notebooks
- Limitations: Not designed for large-scale production, limited horizontal scaling
Pinecone -- Managed SaaS
┌──────────────┐ ┌──────────────────────────┐
│ Application │ │ Pinecone Cloud │
│ ┌────────┐ │ │ ┌────────────────────┐ │
│ │Pinecone│──┼─gRPC──▶ │ API Gateway │ │
│ │ Client │ │ │ └────────┬───────────┘ │
│ └────────┘ │ │ ┌───────▼──────────┐ │
└──────────────┘ │ │ Query Router │ │
│ └───┬─────────┬────┘ │
│ ┌───▼───┐ ┌───▼───┐ │
│ │Shard 1│ │Shard N│ │
│ │(Pod) │ │(Pod) │ │
│ └───────┘ └───────┘ │
└──────────────────────────┘
- Core Technology: Proprietary closed-source engine, serverless or pod-based architecture
- Deployment Model: Fully managed SaaS only (AWS, GCP, Azure regions)
- Strengths: Zero operational overhead, built-in replication, automatic scaling
- Limitations: Vendor lock-in, no self-hosted option, cost grows with scale
Milvus -- Distributed and Scalable
┌────────────────────────────────────────────┐
│ Milvus Cluster │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Proxy │ │ Proxy │ │ Proxy │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ └──────────┬───┴──────────┬──┘ │
│ ┌───────────────▼──────────────▼────────┐ │
│ │ Coordinator Layer │ │
│ │ (Root / Query / Data / Index Coord) │ │
│ └───────────────┬───────────────────────┘ │
│ ┌──────────┼──────────┐ │
│ ┌────▼───┐ ┌────▼───┐ ┌───▼────┐ │
│ │ Query │ │ Data │ │ Index │ │
│ │ Nodes │ │ Nodes │ │ Nodes │ │
│ └────────┘ └────────┘ └────────┘ │
│ │
│ [etcd] [MinIO/S3] [Pulsar/Kafka] │
└────────────────────────────────────────────┘
- Core Technology: Go + C++ core, disaggregated compute and storage, cloud-native architecture
- Deployment Model: Standalone (Docker), cluster (Kubernetes), or Zilliz Cloud (managed)
- Strengths: Handles billions of vectors, rich index types, GPU acceleration support
- Limitations: Complex cluster setup, heavier resource requirements
Qdrant -- Rust-Based Performance
┌─────────────────────────────────────┐
│ Qdrant Cluster │
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │ │ Node 2 │ ... │
│ │ ┌─────┐ │ │ ┌─────┐ │ │
│ │ │Shard│ │ │ │Shard│ │ │
│ │ │ A │ │ │ │ B │ │ │
│ │ └─────┘ │ │ └─────┘ │ │
│ │ ┌─────┐ │ │ ┌─────┐ │ │
│ │ │Shard│ │ │ │Shard│ │ │
│ │ │ B' │ │ │ │ A' │ │ │
│ │ │(rep)│ │ │ │(rep)│ │ │
│ │ └─────┘ │ │ └─────┘ │ │
│ └─────────┘ └─────────┘ │
│ │
│ [Raft Consensus for Coordination] │
└─────────────────────────────────────┘
- Core Technology: Written in Rust, custom HNSW implementation with on-disk support
- Deployment Model: Single node (binary/Docker), distributed cluster, Qdrant Cloud
- Strengths: Memory-efficient, fast filtering with payload indexes, on-disk vector support
- Limitations: Smaller ecosystem than Milvus, relatively newer project
Weaviate -- Hybrid Search Native
┌───────────────────────────────────────┐
│ Weaviate Instance │
│ │
│ ┌─────────────────────────────────┐ │
│ │ GraphQL / REST API │ │
│ └──────────────┬──────────────────┘ │
│ ┌──────────────▼──────────────────┐ │
│ │ Schema Manager │ │
│ └──────────────┬──────────────────┘ │
│ ┌─────────┼─────────┐ │
│ ┌────▼───┐ ┌───▼────┐ ┌─▼────────┐ │
│ │ Vector │ │Inverted│ │ Module │ │
│ │ Index │ │ Index │ │ System │ │
│ │ (HNSW) │ │ (BM25) │ │(OpenAI, │ │
│ │ │ │ │ │ Cohere) │ │
│ └────────┘ └────────┘ └──────────┘ │
└───────────────────────────────────────┘
- Core Technology: Written in Go, native BM25 + vector hybrid search, modular vectorizer system
- Deployment Model: Single node (Docker), Kubernetes cluster, Weaviate Cloud
- Strengths: Built-in hybrid search (BM25 + vector), integrated vectorization modules, GraphQL API
- Limitations: Higher memory consumption, HNSW-only index type
pgvector -- PostgreSQL Extension
┌──────────────────────────────────────┐
│ PostgreSQL Server │
│ │
│ ┌────────────────────────────────┐ │
│ │ pgvector Extension │ │
│ │ ┌──────────┐ ┌───────────┐ │ │
│ │ │ IVFFlat │ │ HNSW │ │ │
│ │ │ Index │ │ Index │ │ │
│ │ └──────────┘ └───────────┘ │ │
│ └────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Relational │ │ Standard │ │
│ │ Tables │ │ SQL Engine │ │
│ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────┘
- Core Technology: C extension for PostgreSQL, adds vector column type and ANN indexes
- Deployment Model: Any PostgreSQL deployment (self-hosted, RDS, Cloud SQL, Supabase)
- Strengths: No new infrastructure, full SQL power, ACID transactions, joins with relational data
- Limitations: Single-node scaling, not optimized purely for vector workloads, slower at very large scale
Architecture Summary
| Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Language | Python | Proprietary | Go/C++ | Rust | Go | C |
| Deployment | Embedded/Server | SaaS only | Standalone/Cluster | Single/Cluster | Single/Cluster | PostgreSQL |
| Open Source | Yes | No | Yes | Yes | Yes | Yes |
| License | Apache 2.0 | Proprietary | Apache 2.0 | Apache 2.0 | BSD-3 | PostgreSQL |
| Cloud Managed | - | Pinecone | Zilliz Cloud | Qdrant Cloud | Weaviate Cloud | RDS/Supabase |
3. Installation and Quick Start
Chroma
# Install via pip
pip install chromadb
# Or run as server with Docker
docker run -p 8000:8000 chromadb/chromaimport chromadb
# Embedded mode (no server needed)
client = chromadb.Client()
# Or connect to server
# client = chromadb.HttpClient(host="localhost", port=8000)
# Create a collection
collection = client.create_collection(
name="documents",
metadata={"hnsw:space": "cosine"}
)
# Insert vectors with metadata
collection.add(
ids=["doc1", "doc2", "doc3"],
embeddings=[
[0.1, 0.2, 0.3, 0.4],
[0.5, 0.6, 0.7, 0.8],
[0.9, 0.1, 0.2, 0.3]
],
metadatas=[
{"source": "wiki", "topic": "AI"},
{"source": "arxiv", "topic": "ML"},
{"source": "blog", "topic": "AI"}
],
documents=["doc about AI", "doc about ML", "another AI doc"]
)
# Search for similar vectors
results = collection.query(
query_embeddings=[[0.1, 0.2, 0.3, 0.4]],
n_results=2,
where={"topic": "AI"}
)
print(results)Pinecone
pip install pineconefrom pinecone import Pinecone, ServerlessSpec
# Initialize client
pc = Pinecone(api_key="YOUR_API_KEY")
# Create index
pc.create_index(
name="documents",
dimension=4,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
# Connect to index
index = pc.Index("documents")
# Insert vectors (upsert)
index.upsert(vectors=[
{"id": "doc1", "values": [0.1, 0.2, 0.3, 0.4],
"metadata": {"source": "wiki", "topic": "AI"}},
{"id": "doc2", "values": [0.5, 0.6, 0.7, 0.8],
"metadata": {"source": "arxiv", "topic": "ML"}},
{"id": "doc3", "values": [0.9, 0.1, 0.2, 0.3],
"metadata": {"source": "blog", "topic": "AI"}}
])
# Search with metadata filter
results = index.query(
vector=[0.1, 0.2, 0.3, 0.4],
top_k=2,
filter={"topic": {"$eq": "AI"}},
include_metadata=True
)
print(results)Milvus
# Start Milvus with Docker Compose
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d
# Install Python SDK
pip install pymilvusfrom pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, max_length=64),
FieldSchema(name="topic", dtype=DataType.VARCHAR, max_length=64),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=4)
]
schema = CollectionSchema(fields, description="Document collection")
# Create collection
collection = Collection("documents", schema)
# Insert data
data = [
["doc1", "doc2", "doc3"],
["AI", "ML", "AI"],
[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 0.1, 0.2, 0.3]]
]
collection.insert(data)
# Build index
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
collection.load()
# Search
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}
results = collection.search(
data=[[0.1, 0.2, 0.3, 0.4]],
anns_field="embedding",
param=search_params,
limit=2,
expr='topic == "AI"',
output_fields=["topic"]
)
for hits in results:
for hit in hits:
print(f"ID: {hit.id}, Distance: {hit.distance}, Topic: {hit.entity.get('topic')}")Qdrant
# Run with Docker
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Install Python SDK
pip install qdrant-clientfrom qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct, Filter, FieldCondition, MatchValue
# Connect to Qdrant
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=4, distance=Distance.COSINE)
)
# Insert vectors
client.upsert(
collection_name="documents",
points=[
PointStruct(id=1, vector=[0.1, 0.2, 0.3, 0.4],
payload={"source": "wiki", "topic": "AI"}),
PointStruct(id=2, vector=[0.5, 0.6, 0.7, 0.8],
payload={"source": "arxiv", "topic": "ML"}),
PointStruct(id=3, vector=[0.9, 0.1, 0.2, 0.3],
payload={"source": "blog", "topic": "AI"})
]
)
# Search with filter
results = client.query_points(
collection_name="documents",
query=[0.1, 0.2, 0.3, 0.4],
limit=2,
query_filter=Filter(
must=[FieldCondition(key="topic", match=MatchValue(value="AI"))]
)
)
for point in results.points:
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")Weaviate
# Run with Docker
docker run -p 8080:8080 -p 50051:50051 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
semitechnologies/weaviate
# Install Python SDK
pip install weaviate-clientimport weaviate
import weaviate.classes as wvc
# Connect to Weaviate
client = weaviate.connect_to_local()
# Create collection (class)
documents = client.collections.create(
name="Document",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
properties=[
wvc.config.Property(name="source", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="topic", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
]
)
# Insert vectors
documents.data.insert_many([
wvc.data.DataObject(
properties={"source": "wiki", "topic": "AI", "content": "doc about AI"},
vector=[0.1, 0.2, 0.3, 0.4]
),
wvc.data.DataObject(
properties={"source": "arxiv", "topic": "ML", "content": "doc about ML"},
vector=[0.5, 0.6, 0.7, 0.8]
),
wvc.data.DataObject(
properties={"source": "blog", "topic": "AI", "content": "another AI doc"},
vector=[0.9, 0.1, 0.2, 0.3]
)
])
# Search with filter
results = documents.query.near_vector(
near_vector=[0.1, 0.2, 0.3, 0.4],
limit=2,
filters=wvc.query.Filter.by_property("topic").equal("AI"),
return_metadata=wvc.query.MetadataQuery(distance=True)
)
for obj in results.objects:
print(f"Topic: {obj.properties['topic']}, Distance: {obj.metadata.distance}")
client.close()pgvector
# Install extension (PostgreSQL 13+)
# Ubuntu/Debian
sudo apt install postgresql-16-pgvector
# Or build from source
cd /tmp && git clone https://github.com/pgvector/pgvector.git
cd pgvector && make && sudo make install
# Or use Docker
docker run -p 5432:5432 -e POSTGRES_PASSWORD=postgres ankane/pgvector-- Enable extension
CREATE EXTENSION vector;
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
source VARCHAR(64),
topic VARCHAR(64),
embedding vector(4)
);
-- Insert data
INSERT INTO documents (content, source, topic, embedding) VALUES
('doc about AI', 'wiki', 'AI', '[0.1, 0.2, 0.3, 0.4]'),
('doc about ML', 'arxiv', 'ML', '[0.5, 0.6, 0.7, 0.8]'),
('another AI doc', 'blog', 'AI', '[0.9, 0.1, 0.2, 0.3]');
-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);
-- Search with filter
SELECT id, content, topic,
1 - (embedding <=> '[0.1, 0.2, 0.3, 0.4]') AS similarity
FROM documents
WHERE topic = 'AI'
ORDER BY embedding <=> '[0.1, 0.2, 0.3, 0.4]'
LIMIT 2;# Python with psycopg2
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres password=postgres")
register_vector(conn)
cur = conn.cursor()
query_vec = np.array([0.1, 0.2, 0.3, 0.4])
cur.execute("""
SELECT id, content, topic, 1 - (embedding <=> %s) AS similarity
FROM documents
WHERE topic = 'AI'
ORDER BY embedding <=> %s
LIMIT 2
""", (query_vec, query_vec))
for row in cur.fetchall():
print(f"ID: {row[0]}, Content: {row[1]}, Similarity: {row[3]:.4f}")4. Indexing Algorithms
Vector databases rely on ANN (Approximate Nearest Neighbor) indexing algorithms to achieve fast search over millions or billions of vectors. Here are the main algorithms and their trade-offs.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph where each node is connected to its nearest neighbors. The search starts from the top layer (sparse) and descends to the bottom layer (dense), efficiently navigating to the target region.
Layer 2: A ─────────────── D
│ │
Layer 1: A ─── B ─── C ─── D ─── E
│ │ │ │ │
Layer 0: A ─ B ─ C ─ D ─ E ─ F ─ G ─ H
- Key Parameters:
M(max connections per node),efConstruction(build-time search width),ef(query-time search width) - Strengths: Excellent query performance, high recall, incremental inserts
- Weaknesses: High memory usage (stores graph in RAM), slower build time
IVF (Inverted File Index)
IVF partitions the vector space into clusters using k-means. At query time, only the closest clusters are searched rather than the entire dataset.
┌─────────────────────────────────┐
│ Cluster 1 Cluster 2 │
│ ┌──────┐ ┌──────┐ │
│ │ • • │ │ • • │ │
│ │ •• • │ │ • • │ │
│ └──────┘ └──────┘ │
│ Cluster 3 │
│ ┌──────┐ │
│ │ •• • │ │
│ │ • • │ │
│ └──────┘ │
└─────────────────────────────────┘
Query: Search only nearest nprobe clusters
- Key Parameters:
nlist(number of clusters),nprobe(clusters to search at query time) - Strengths: Lower memory usage, fast build, works well with GPU
- Weaknesses: Requires training step, lower recall at low nprobe values
PQ (Product Quantization)
PQ compresses vectors by dividing them into sub-vectors and quantizing each sub-vector independently. This dramatically reduces memory usage while maintaining reasonable accuracy.
Original Vector (128-dim):
[0.1, 0.2, ..., 0.5, 0.6, ..., 0.3, 0.4, ..., 0.7, 0.8, ...]
└──── Sub 1 ────┘ └──── Sub 2 ────┘ └──── Sub 3 ────┘ └──── Sub 4 ────┘
↓ ↓ ↓ ↓
Code: 42 Code: 17 Code: 89 Code: 5
(1 byte) (1 byte) (1 byte) (1 byte)
Compressed: [42, 17, 89, 5] = 4 bytes (vs 512 bytes original)
- Key Parameters:
m(number of sub-quantizers),nbits(bits per code) - Strengths: Very low memory (32x--64x compression), fast distance computation
- Weaknesses: Lower accuracy, requires training, best combined with IVF
Flat (Brute Force)
Flat index stores raw vectors and computes exact distances against every vector. No approximation involved.
- Strengths: 100% recall (exact results), no training needed
- Weaknesses: O(n) query time, impractical for large datasets
Algorithm Comparison
| Feature | HNSW | IVF | PQ | Flat |
|---|---|---|---|---|
| Query Speed | Very Fast | Fast | Fast | Slow |
| Memory Usage | High | Medium | Very Low | High |
| Build Time | Slow | Medium (requires training) | Slow (requires training) | None |
| Recall @ top-10 | 95--99% | 85--95% | 70--90% | 100% |
| Incremental Insert | Yes | Requires rebuilding | Requires rebuilding | Yes |
| Best Scale | 1M--100M | 10M--1B | 100M--10B | < 100K |
| GPU Acceleration | Limited | Yes | Yes | Yes |
Note: In practice, hybrid index strategies like IVF+PQ or IVF+HNSW are commonly used to balance memory, speed, and accuracy. Milvus supports the most index types, while pgvector and Chroma focus primarily on HNSW.
Index Support by Database
| Index Type | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| HNSW | Yes | Internal | Yes | Yes | Yes | Yes |
| IVFFlat | - | Internal | Yes | - | - | Yes |
| IVF+PQ | - | Internal | Yes | - | - | - |
| Flat | Yes | - | Yes | - | Yes | Yes |
| DiskANN | - | - | Yes | - | - | - |
| On-disk vectors | - | - | Yes | Yes | - | - |
5. Search Capabilities
Vector Similarity Search
All vector databases support basic similarity search with configurable distance metrics. Here is a unified comparison of search capabilities:
| Capability | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Cosine | Yes | Yes | Yes | Yes | Yes | Yes |
| L2 (Euclidean) | Yes | Yes | Yes | Yes | Yes | Yes |
| Inner Product | - | Yes | Yes | Yes | Yes | Yes |
| Hamming | - | - | Yes | - | - | Yes |
Metadata Filtering
Filtering results by metadata (payload) fields is critical for production applications. Each database offers different filtering capabilities:
# Qdrant -- Advanced filtering example
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
results = client.query_points(
collection_name="documents",
query=[0.1, 0.2, 0.3, 0.4],
limit=5,
query_filter=Filter(
must=[
FieldCondition(key="topic", match=MatchValue(value="AI")),
FieldCondition(key="year", range=Range(gte=2023)),
],
must_not=[
FieldCondition(key="status", match=MatchValue(value="draft"))
]
)
)# Milvus -- Boolean expression filtering
results = collection.search(
data=[[0.1, 0.2, 0.3, 0.4]],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 64}},
limit=5,
expr='topic == "AI" and year >= 2023 and status != "draft"'
)-- pgvector -- Full SQL WHERE clause power
SELECT id, content, 1 - (embedding <=> query_vec) AS similarity
FROM documents
WHERE topic = 'AI'
AND year >= 2023
AND status != 'draft'
AND category IN ('research', 'tutorial')
ORDER BY embedding <=> query_vec
LIMIT 5;Hybrid Search (Vector + Keyword)
Hybrid search combines dense vector similarity with sparse keyword matching (BM25) for improved retrieval quality. This is particularly effective when queries contain specific terms or acronyms.
# Weaviate -- Native hybrid search (BM25 + vector)
results = documents.query.hybrid(
query="transformer architecture attention mechanism",
alpha=0.5, # 0 = pure keyword, 1 = pure vector
limit=5,
return_metadata=wvc.query.MetadataQuery(score=True)
)
for obj in results.objects:
print(f"Score: {obj.metadata.score:.4f}, Content: {obj.properties['content'][:80]}")# Qdrant -- Sparse + Dense fusion
from qdrant_client.models import SparseVector
# Requires collection configured with both dense and sparse vectors
results = client.query_points(
collection_name="documents",
prefetch=[
# Dense vector search
{"query": [0.1, 0.2, 0.3, 0.4], "using": "dense", "limit": 20},
# Sparse vector search (BM25 weights)
{"query": SparseVector(indices=[1, 42, 100], values=[0.5, 0.8, 0.3]),
"using": "sparse", "limit": 20},
],
query={"fusion": "rrf"}, # Reciprocal Rank Fusion
limit=5
)-- pgvector + tsvector -- Hybrid search with PostgreSQL full-text search
SELECT id, content,
ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'transformer attention')) AS text_score,
1 - (embedding <=> query_vec) AS vector_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'transformer attention')
ORDER BY (0.5 * ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'transformer attention'))
+ 0.5 * (1 - (embedding <=> query_vec))) DESC
LIMIT 5;| Hybrid Search Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Native BM25 | - | - | Yes (2.4+) | - | Yes | Via tsvector |
| Sparse Vectors | - | Yes | Yes | Yes | - | - |
| Reciprocal Rank Fusion | - | - | Yes | Yes | Yes | Manual |
| Configurable Weighting | - | - | Yes | Yes | Yes (alpha) | Manual |
Multi-Vector Search
Some databases support storing and searching multiple vectors per record, useful for multi-modal data (text + image) or late-interaction retrieval models like ColBERT.
# Qdrant -- Named vectors (multiple vectors per point)
from qdrant_client.models import VectorParams, Distance
client.create_collection(
collection_name="multimodal",
vectors_config={
"text": VectorParams(size=768, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.COSINE),
}
)
# Search by text vector
results = client.query_points(
collection_name="multimodal",
query=[0.1] * 768,
using="text",
limit=5
)# Milvus -- Multiple vector fields
fields = [
FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, max_length=64),
FieldSchema(name="text_vec", dtype=DataType.FLOAT_VECTOR, dim=768),
FieldSchema(name="image_vec", dtype=DataType.FLOAT_VECTOR, dim=512),
]6. Scalability and Performance
Benchmark Comparison
The following table shows approximate performance characteristics based on published benchmarks and community testing. Actual results vary significantly with hardware, data distribution, dimensionality, and configuration.
Note: These numbers are rough guidelines, not exact benchmarks. Always run your own tests with representative data and queries on your target hardware.
| Metric | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Insert Speed (vec/sec) | ~5K | ~10K | ~50K | ~30K | ~15K | ~8K |
| Query Latency (p99, 1M vectors) | ~15ms | ~10ms | ~5ms | ~5ms | ~10ms | ~20ms |
| Concurrent Queries (QPS) | ~500 | ~1,000+ | ~5,000+ | ~3,000+ | ~1,500 | ~800 |
| Max Vectors (practical) | 1M | 1B+ | 10B+ | 1B+ | 100M+ | 10M |
| Memory per 1M vectors (768d) | ~3.5 GB | Managed | ~3 GB | ~2.5 GB | ~4 GB | ~3 GB |
Horizontal Scaling
| Scaling Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Sharding | - | Automatic | Yes | Yes | Yes | Manual (Citus) |
| Replication | - | Built-in | Yes | Yes | Yes | PostgreSQL streaming |
| Auto-scaling | - | Yes (serverless) | Via K8s | Via K8s | Via K8s | - |
| Multi-region | - | Yes | Manual | Manual | Yes | Manual |
| GPU Acceleration | - | - | Yes (Knowhere) | - | - | - |
| Disk-based Vectors | - | - | Yes (DiskANN) | Yes (mmap) | - | On disk by default |
Scaling Architecture Patterns
For small-scale projects (under 1 million vectors), a single-node deployment of any database will suffice. As you scale beyond that, the architectural choices diverge:
Small Scale (< 1M vectors):
Any database, single node
Medium Scale (1M - 100M vectors):
Qdrant single node with on-disk vectors
Milvus standalone
pgvector with proper indexing
Large Scale (100M - 1B vectors):
Milvus cluster (sharded)
Qdrant distributed mode
Weaviate cluster
Pinecone (managed)
Very Large Scale (1B+ vectors):
Milvus cluster with DiskANN
Pinecone Enterprise
7. Enterprise Features
Authentication and Authorization
| Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| API Key Auth | Yes | Yes | Yes | Yes | Yes | N/A (PostgreSQL) |
| RBAC | - | Yes | Yes (2.3+) | - | Yes | PostgreSQL RBAC |
| TLS/SSL | Yes | Yes | Yes | Yes | Yes | PostgreSQL SSL |
| SSO/OIDC | - | Yes | - | - | Yes | Via proxy |
| Multi-tenancy | Collection-level | Namespace | Database/Partition | Collection/Payload | Tenant API | Schema/RLS |
Backup and Restore
| Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Snapshot/Backup | File copy | Managed | milvus-backup tool | Snapshot API | Backup API | pg_dump |
| Point-in-time Recovery | - | - | - | - | - | PostgreSQL WAL |
| Cross-region Backup | - | Yes | Via S3 | Manual | Yes | Via streaming replication |
Monitoring and Observability
# Milvus -- Prometheus metrics endpoint
curl http://localhost:9091/metrics
# Qdrant -- Built-in metrics
curl http://localhost:6333/metrics
# Weaviate -- Prometheus metrics
curl http://localhost:2112/metrics| Feature | Chroma | Pinecone | Milvus | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|---|
| Prometheus Metrics | - | Dashboard | Yes | Yes | Yes | pg_stat extensions |
| Grafana Dashboards | - | - | Official | Community | Official | PostgreSQL dashboards |
| Distributed Tracing | - | - | Jaeger | - | - | - |
| Query Logging | Basic | Dashboard | Yes | Yes | Yes | PostgreSQL pg_stat_statements |
Cloud Deployment Options
| Database | Managed Service | Kubernetes Helm | Terraform | Major Cloud Marketplace |
|---|---|---|---|---|
| Chroma | - | Community | - | - |
| Pinecone | Pinecone.io | N/A (SaaS) | Yes | AWS |
| Milvus | Zilliz Cloud | Official | Yes | AWS, GCP, Azure |
| Qdrant | Qdrant Cloud | Official | Yes | AWS, GCP, Azure |
| Weaviate | Weaviate Cloud | Official | Yes | AWS, GCP |
| pgvector | RDS, Cloud SQL, Supabase | Via PostgreSQL charts | Yes | All major clouds |
8. Integration with AI Frameworks
LangChain Integration
LangChain provides unified interfaces for all major vector databases:
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
docs = [
Document(page_content="RAG improves LLM accuracy with retrieval", metadata={"topic": "RAG"}),
Document(page_content="Vector databases store embeddings efficiently", metadata={"topic": "VectorDB"}),
Document(page_content="HNSW enables fast approximate nearest neighbor search", metadata={"topic": "Indexing"}),
]# Chroma
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(docs, embeddings, collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
# Pinecone
from langchain_pinecone import PineconeVectorStore
vectorstore = PineconeVectorStore.from_documents(docs, embeddings, index_name="langchain-demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
# Milvus
from langchain_milvus import Milvus
vectorstore = Milvus.from_documents(docs, embeddings, collection_name="langchain_demo",
connection_args={"host": "localhost", "port": "19530"})
results = vectorstore.similarity_search("How does RAG work?", k=2)
# Qdrant
from langchain_qdrant import QdrantVectorStore
vectorstore = QdrantVectorStore.from_documents(docs, embeddings,
url="http://localhost:6333",
collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
# Weaviate
from langchain_weaviate import WeaviateVectorStore
import weaviate
weaviate_client = weaviate.connect_to_local()
vectorstore = WeaviateVectorStore.from_documents(docs, embeddings, client=weaviate_client)
results = vectorstore.similarity_search("How does RAG work?", k=2)
# pgvector
from langchain_postgres import PGVector
vectorstore = PGVector.from_documents(docs, embeddings,
connection="postgresql://user:pass@localhost/db",
collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)LlamaIndex Integration
from llama_index.core import VectorStoreIndex, Document
from llama_index.embeddings.openai import OpenAIEmbedding
documents = [Document(text="RAG combines retrieval with generation for better AI responses.")]
embed_model = OpenAIEmbedding(model="text-embedding-3-small")# Chroma
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("llama_demo")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")# Qdrant
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
qdrant_client = QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="llama_demo")
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")# Milvus
from llama_index.vector_stores.milvus import MilvusVectorStore
vector_store = MilvusVectorStore(uri="http://localhost:19530", collection_name="llama_demo", dim=1536)
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")Haystack Integration
# Qdrant with Haystack
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack import Pipeline, Document
# Set up document store
document_store = QdrantDocumentStore(
url="http://localhost:6333",
index="haystack_demo",
embedding_dim=1536,
recreate_index=True
)
# Index documents
doc_embedder = OpenAIDocumentEmbedder(model="text-embedding-3-small")
docs = [Document(content="RAG improves LLM accuracy with external knowledge retrieval.")]
docs_with_embeddings = doc_embedder.run(documents=docs)
document_store.write_documents(docs_with_embeddings["documents"])
# Build retrieval pipeline
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OpenAITextEmbedder(model="text-embedding-3-small"))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(document_store=document_store, top_k=3))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
results = query_pipeline.run({"text_embedder": {"text": "What is RAG?"}})
for doc in results["retriever"]["documents"]:
print(f"Score: {doc.score:.4f}, Content: {doc.content[:80]}")9. Selection Guide
Decision Flowchart
Use the following text-based decision tree to narrow down your choice:
START: What is your primary requirement?
│
├─ "Quick prototype / experimentation"
│ └─→ Chroma (zero config, pip install, runs in-process)
│
├─ "Already using PostgreSQL"
│ └─ How many vectors?
│ ├─ < 5M → pgvector (no new infra, full SQL power)
│ └─ > 5M → Consider dedicated vector DB
│
├─ "Zero operational overhead / managed service"
│ └─→ Pinecone (fully managed, auto-scaling)
│
├─ "Massive scale (billions of vectors)"
│ └─→ Milvus (distributed architecture, GPU support, DiskANN)
│
├─ "High-performance with complex filtering"
│ └─→ Qdrant (Rust performance, payload indexes, on-disk vectors)
│
└─ "Hybrid search (keyword + vector) is critical"
└─→ Weaviate (native BM25 + vector, built-in vectorization modules)
Scenario-Based Recommendations
| Scenario | Recommended DB | Reason |
|---|---|---|
| Prototyping / hackathon | Chroma | Zero setup, pip install, runs in Jupyter |
| Existing PostgreSQL infrastructure | pgvector | No new infra, SQL joins with relational data, ACID |
| Managed service with minimal ops | Pinecone | Fully managed SaaS, auto-scaling, built-in monitoring |
| Large-scale production (100M+ vectors) | Milvus | Distributed architecture, GPU acceleration, rich index types |
| High-performance filtering workloads | Qdrant | Rust-based engine, efficient payload indexing, low memory |
| Keyword + vector hybrid search | Weaviate | Native BM25 + vector fusion, integrated vectorization |
| Multi-modal search (text + image) | Qdrant or Milvus | Named vectors / multi-vector field support |
| Cost-sensitive startup | pgvector or Chroma | Open source, no additional infrastructure costs |
| Enterprise with compliance requirements | Pinecone or Milvus | RBAC, SOC2, enterprise support contracts |
| Edge / mobile deployment | Chroma | Lightweight embedded mode, minimal dependencies |
Cost Comparison Overview
| Database | Self-hosted Cost | Managed Service Starting Price | Free Tier |
|---|---|---|---|
| Chroma | Infrastructure only | N/A | Open source |
| Pinecone | N/A (SaaS only) | Serverless: pay-per-use | 2GB storage |
| Milvus | Infrastructure only | Zilliz: from ~$65/month | Free trial |
| Qdrant | Infrastructure only | Qdrant Cloud: from ~$25/month | 1GB free cluster |
| Weaviate | Infrastructure only | Weaviate Cloud: from ~$25/month | Sandbox available |
| pgvector | PostgreSQL costs | RDS/Supabase pricing | Supabase free tier |
Migration Considerations
When choosing a vector database, consider these factors for long-term success:
- Data portability -- Can you export and import vectors easily? Open-source databases generally offer better portability than proprietary ones.
- API stability -- How mature is the SDK? Frequent breaking changes increase maintenance burden.
- Community and ecosystem -- Larger communities mean more tutorials, integrations, and faster bug fixes.
- Operational complexity -- How much DevOps effort does the database require in production?
- Lock-in risk -- Can you switch databases without rewriting your entire application? Using framework abstractions (LangChain, LlamaIndex) mitigates this.
References
- Chroma Documentation: https://docs.trychroma.com/
- Pinecone Documentation: https://docs.pinecone.io/
- Milvus Documentation: https://milvus.io/docs
- Qdrant Documentation: https://qdrant.tech/documentation/
- Weaviate Documentation: https://weaviate.io/developers/weaviate
- pgvector GitHub: https://github.com/pgvector/pgvector
- LangChain Vector Stores: https://python.langchain.com/docs/integrations/vectorstores/
- LlamaIndex Vector Stores: https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores/
- ANN Benchmarks: https://ann-benchmarks.com/
- HNSW Paper: Malkov and Yashunin, "Efficient and Robust Approximate Nearest Neighbor Using Hierarchical Navigable Small World Graphs," IEEE TPAMI, 2018
-- Data Dynamics Engineering Team