Blog
vector-databaseragembeddingmilvuspineconechromapgvectorai

Vector Database Complete Comparison - Chroma, Milvus, Pinecone, Qdrant, Weaviate, pgvector

A comprehensive comparison of major vector databases covering architecture, installation, performance, indexing, hybrid search, scalability, and selection guide for RAG and AI applications.

Data DynamicsApril 16, 202627 min read

Vector databases have become essential infrastructure for modern AI applications. From RAG pipelines to recommendation systems and image search, any application that works with embeddings needs a reliable way to store and retrieve vectors at scale. This post provides a comprehensive, hands-on comparison of six major vector databases to help you make the right choice.


1. Vector Database Overview

What Is a Vector Database?

A vector database is a specialized storage system designed to index, store, and query high-dimensional vector data (embeddings). Unlike traditional databases that operate on exact matches or range queries over scalar values, vector databases find the most similar items based on distance metrics in high-dimensional space.

[Traditional Database]
Query: SELECT * FROM products WHERE category = 'shoes' AND price < 100
Result: Exact matches based on structured fields

[Vector Database]
Query: Find the 10 vectors most similar to this embedding [0.12, -0.45, 0.78, ...]
Result: Semantically similar items ranked by distance

Why Vector Databases Matter for RAG and AI

In a RAG (Retrieval-Augmented Generation) pipeline, vector databases serve as the knowledge retrieval layer:

  1. Document Ingestion -- Text is split into chunks and converted to embeddings via models like OpenAI text-embedding-3-small or sentence-transformers
  2. Storage -- Embeddings are stored alongside metadata in the vector database
  3. Retrieval -- At query time, the user question is embedded and the database returns the most relevant chunks
  4. Generation -- Retrieved chunks are passed as context to the LLM for answer generation
User Query
    │
    ▼
[Embedding Model] ──→ Query Vector
    │
    ▼
[Vector Database] ──→ Top-K Similar Documents
    │
    ▼
[LLM + Context] ──→ Grounded Response

Key Concepts

Embedding: A fixed-length numerical vector (e.g., 768 or 1536 dimensions) that captures the semantic meaning of text, images, or other data. Similar content produces vectors that are close together in vector space.

Similarity Search: Finding the nearest neighbors to a query vector. Common distance metrics include:

MetricFormulaBest For
Cosine Similaritycos(A, B) = A . B / (||A|| * ||B||)Text similarity, normalized embeddings
Euclidean (L2)sqrt(sum((a_i - b_i)^2))Image features, spatial data
Inner Product (IP)sum(a_i * b_i)Maximum inner product search, recommendation

Approximate Nearest Neighbor (ANN): Exact nearest neighbor search is computationally prohibitive at scale (O(n) per query). ANN algorithms like HNSW and IVF trade a small amount of accuracy for dramatically faster retrieval, often achieving 95-99% recall at 100x+ speedup.


2. Architecture Comparison

Chroma -- Embedded and Lightweight

┌─────────────────────────────┐
│         Application         │
│  ┌───────────────────────┐  │
│  │     Chroma Client     │  │
│  │  (Python / JS SDK)    │  │
│  └──────────┬────────────┘  │
│             │               │
│  ┌──────────▼────────────┐  │
│  │   Chroma Core Engine  │  │
│  │  ┌────────┐ ┌───────┐ │  │
│  │  │ HNSW   │ │SQLite │ │  │
│  │  │ Index  │ │Meta   │ │  │
│  │  └────────┘ └───────┘ │  │
│  └───────────────────────┘  │
└─────────────────────────────┘
  • Core Technology: Python-based, uses HNSW (hnswlib) for indexing, SQLite/DuckDB for metadata
  • Deployment Model: Embedded (in-process), client/server, or Docker
  • Strengths: Zero-config setup, ideal for prototyping, runs in Jupyter notebooks
  • Limitations: Not designed for large-scale production, limited horizontal scaling

Pinecone -- Managed SaaS

┌──────────────┐       ┌──────────────────────────┐
│  Application │       │     Pinecone Cloud        │
│  ┌────────┐  │       │  ┌────────────────────┐   │
│  │Pinecone│──┼─gRPC──▶  │   API Gateway      │   │
│  │ Client │  │       │  └────────┬───────────┘   │
│  └────────┘  │       │  ┌───────▼──────────┐     │
└──────────────┘       │  │  Query Router     │     │
                       │  └───┬─────────┬────┘     │
                       │  ┌───▼───┐ ┌───▼───┐      │
                       │  │Shard 1│ │Shard N│      │
                       │  │(Pod)  │ │(Pod)  │      │
                       │  └───────┘ └───────┘      │
                       └──────────────────────────┘
  • Core Technology: Proprietary closed-source engine, serverless or pod-based architecture
  • Deployment Model: Fully managed SaaS only (AWS, GCP, Azure regions)
  • Strengths: Zero operational overhead, built-in replication, automatic scaling
  • Limitations: Vendor lock-in, no self-hosted option, cost grows with scale

Milvus -- Distributed and Scalable

┌────────────────────────────────────────────┐
│              Milvus Cluster                │
│                                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │  Proxy   │  │  Proxy   │  │  Proxy   │ │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘ │
│       └──────────┬───┴──────────┬──┘       │
│  ┌───────────────▼──────────────▼────────┐ │
│  │           Coordinator Layer           │ │
│  │  (Root / Query / Data / Index Coord)  │ │
│  └───────────────┬───────────────────────┘ │
│       ┌──────────┼──────────┐              │
│  ┌────▼───┐ ┌────▼───┐ ┌───▼────┐         │
│  │ Query  │ │ Data   │ │ Index  │         │
│  │ Nodes  │ │ Nodes  │ │ Nodes  │         │
│  └────────┘ └────────┘ └────────┘         │
│                                            │
│  [etcd]  [MinIO/S3]  [Pulsar/Kafka]       │
└────────────────────────────────────────────┘
  • Core Technology: Go + C++ core, disaggregated compute and storage, cloud-native architecture
  • Deployment Model: Standalone (Docker), cluster (Kubernetes), or Zilliz Cloud (managed)
  • Strengths: Handles billions of vectors, rich index types, GPU acceleration support
  • Limitations: Complex cluster setup, heavier resource requirements

Qdrant -- Rust-Based Performance

┌─────────────────────────────────────┐
│           Qdrant Cluster            │
│                                     │
│  ┌─────────┐  ┌─────────┐          │
│  │  Node 1 │  │  Node 2 │  ...     │
│  │ ┌─────┐ │  │ ┌─────┐ │          │
│  │ │Shard│ │  │ │Shard│ │          │
│  │ │  A  │ │  │ │  B  │ │          │
│  │ └─────┘ │  │ └─────┘ │          │
│  │ ┌─────┐ │  │ ┌─────┐ │          │
│  │ │Shard│ │  │ │Shard│ │          │
│  │ │  B' │ │  │ │  A' │ │          │
│  │ │(rep)│ │  │ │(rep)│ │          │
│  │ └─────┘ │  │ └─────┘ │          │
│  └─────────┘  └─────────┘          │
│                                     │
│  [Raft Consensus for Coordination]  │
└─────────────────────────────────────┘
  • Core Technology: Written in Rust, custom HNSW implementation with on-disk support
  • Deployment Model: Single node (binary/Docker), distributed cluster, Qdrant Cloud
  • Strengths: Memory-efficient, fast filtering with payload indexes, on-disk vector support
  • Limitations: Smaller ecosystem than Milvus, relatively newer project

Weaviate -- Hybrid Search Native

┌───────────────────────────────────────┐
│           Weaviate Instance           │
│                                       │
│  ┌─────────────────────────────────┐  │
│  │         GraphQL / REST API      │  │
│  └──────────────┬──────────────────┘  │
│  ┌──────────────▼──────────────────┐  │
│  │         Schema Manager          │  │
│  └──────────────┬──────────────────┘  │
│       ┌─────────┼─────────┐           │
│  ┌────▼───┐ ┌───▼────┐ ┌─▼────────┐  │
│  │ Vector │ │Inverted│ │ Module   │  │
│  │ Index  │ │ Index  │ │ System   │  │
│  │ (HNSW) │ │ (BM25) │ │(OpenAI, │  │
│  │        │ │        │ │ Cohere)  │  │
│  └────────┘ └────────┘ └──────────┘  │
└───────────────────────────────────────┘
  • Core Technology: Written in Go, native BM25 + vector hybrid search, modular vectorizer system
  • Deployment Model: Single node (Docker), Kubernetes cluster, Weaviate Cloud
  • Strengths: Built-in hybrid search (BM25 + vector), integrated vectorization modules, GraphQL API
  • Limitations: Higher memory consumption, HNSW-only index type

pgvector -- PostgreSQL Extension

┌──────────────────────────────────────┐
│           PostgreSQL Server          │
│                                      │
│  ┌────────────────────────────────┐  │
│  │        pgvector Extension      │  │
│  │  ┌──────────┐  ┌───────────┐  │  │
│  │  │  IVFFlat │  │   HNSW    │  │  │
│  │  │  Index   │  │   Index   │  │  │
│  │  └──────────┘  └───────────┘  │  │
│  └────────────────────────────────┘  │
│                                      │
│  ┌──────────────┐  ┌──────────────┐  │
│  │  Relational  │  │  Standard   │  │
│  │   Tables     │  │  SQL Engine │  │
│  └──────────────┘  └──────────────┘  │
└──────────────────────────────────────┘
  • Core Technology: C extension for PostgreSQL, adds vector column type and ANN indexes
  • Deployment Model: Any PostgreSQL deployment (self-hosted, RDS, Cloud SQL, Supabase)
  • Strengths: No new infrastructure, full SQL power, ACID transactions, joins with relational data
  • Limitations: Single-node scaling, not optimized purely for vector workloads, slower at very large scale

Architecture Summary

FeatureChromaPineconeMilvusQdrantWeaviatepgvector
LanguagePythonProprietaryGo/C++RustGoC
DeploymentEmbedded/ServerSaaS onlyStandalone/ClusterSingle/ClusterSingle/ClusterPostgreSQL
Open SourceYesNoYesYesYesYes
LicenseApache 2.0ProprietaryApache 2.0Apache 2.0BSD-3PostgreSQL
Cloud Managed-PineconeZilliz CloudQdrant CloudWeaviate CloudRDS/Supabase

3. Installation and Quick Start

Chroma

# Install via pip
pip install chromadb
 
# Or run as server with Docker
docker run -p 8000:8000 chromadb/chroma
import chromadb
 
# Embedded mode (no server needed)
client = chromadb.Client()
 
# Or connect to server
# client = chromadb.HttpClient(host="localhost", port=8000)
 
# Create a collection
collection = client.create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}
)
 
# Insert vectors with metadata
collection.add(
    ids=["doc1", "doc2", "doc3"],
    embeddings=[
        [0.1, 0.2, 0.3, 0.4],
        [0.5, 0.6, 0.7, 0.8],
        [0.9, 0.1, 0.2, 0.3]
    ],
    metadatas=[
        {"source": "wiki", "topic": "AI"},
        {"source": "arxiv", "topic": "ML"},
        {"source": "blog", "topic": "AI"}
    ],
    documents=["doc about AI", "doc about ML", "another AI doc"]
)
 
# Search for similar vectors
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.3, 0.4]],
    n_results=2,
    where={"topic": "AI"}
)
print(results)

Pinecone

pip install pinecone
from pinecone import Pinecone, ServerlessSpec
 
# Initialize client
pc = Pinecone(api_key="YOUR_API_KEY")
 
# Create index
pc.create_index(
    name="documents",
    dimension=4,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)
 
# Connect to index
index = pc.Index("documents")
 
# Insert vectors (upsert)
index.upsert(vectors=[
    {"id": "doc1", "values": [0.1, 0.2, 0.3, 0.4],
     "metadata": {"source": "wiki", "topic": "AI"}},
    {"id": "doc2", "values": [0.5, 0.6, 0.7, 0.8],
     "metadata": {"source": "arxiv", "topic": "ML"}},
    {"id": "doc3", "values": [0.9, 0.1, 0.2, 0.3],
     "metadata": {"source": "blog", "topic": "AI"}}
])
 
# Search with metadata filter
results = index.query(
    vector=[0.1, 0.2, 0.3, 0.4],
    top_k=2,
    filter={"topic": {"$eq": "AI"}},
    include_metadata=True
)
print(results)

Milvus

# Start Milvus with Docker Compose
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d
 
# Install Python SDK
pip install pymilvus
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
 
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
 
# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, max_length=64),
    FieldSchema(name="topic", dtype=DataType.VARCHAR, max_length=64),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=4)
]
schema = CollectionSchema(fields, description="Document collection")
 
# Create collection
collection = Collection("documents", schema)
 
# Insert data
data = [
    ["doc1", "doc2", "doc3"],
    ["AI", "ML", "AI"],
    [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 0.1, 0.2, 0.3]]
]
collection.insert(data)
 
# Build index
index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
collection.load()
 
# Search
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}
results = collection.search(
    data=[[0.1, 0.2, 0.3, 0.4]],
    anns_field="embedding",
    param=search_params,
    limit=2,
    expr='topic == "AI"',
    output_fields=["topic"]
)
for hits in results:
    for hit in hits:
        print(f"ID: {hit.id}, Distance: {hit.distance}, Topic: {hit.entity.get('topic')}")

Qdrant

# Run with Docker
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
 
# Install Python SDK
pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct, Filter, FieldCondition, MatchValue
 
# Connect to Qdrant
client = QdrantClient(host="localhost", port=6333)
 
# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=4, distance=Distance.COSINE)
)
 
# Insert vectors
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(id=1, vector=[0.1, 0.2, 0.3, 0.4],
                    payload={"source": "wiki", "topic": "AI"}),
        PointStruct(id=2, vector=[0.5, 0.6, 0.7, 0.8],
                    payload={"source": "arxiv", "topic": "ML"}),
        PointStruct(id=3, vector=[0.9, 0.1, 0.2, 0.3],
                    payload={"source": "blog", "topic": "AI"})
    ]
)
 
# Search with filter
results = client.query_points(
    collection_name="documents",
    query=[0.1, 0.2, 0.3, 0.4],
    limit=2,
    query_filter=Filter(
        must=[FieldCondition(key="topic", match=MatchValue(value="AI"))]
    )
)
for point in results.points:
    print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")

Weaviate

# Run with Docker
docker run -p 8080:8080 -p 50051:50051 \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
  semitechnologies/weaviate
 
# Install Python SDK
pip install weaviate-client
import weaviate
import weaviate.classes as wvc
 
# Connect to Weaviate
client = weaviate.connect_to_local()
 
# Create collection (class)
documents = client.collections.create(
    name="Document",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    properties=[
        wvc.config.Property(name="source", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="topic", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
    ]
)
 
# Insert vectors
documents.data.insert_many([
    wvc.data.DataObject(
        properties={"source": "wiki", "topic": "AI", "content": "doc about AI"},
        vector=[0.1, 0.2, 0.3, 0.4]
    ),
    wvc.data.DataObject(
        properties={"source": "arxiv", "topic": "ML", "content": "doc about ML"},
        vector=[0.5, 0.6, 0.7, 0.8]
    ),
    wvc.data.DataObject(
        properties={"source": "blog", "topic": "AI", "content": "another AI doc"},
        vector=[0.9, 0.1, 0.2, 0.3]
    )
])
 
# Search with filter
results = documents.query.near_vector(
    near_vector=[0.1, 0.2, 0.3, 0.4],
    limit=2,
    filters=wvc.query.Filter.by_property("topic").equal("AI"),
    return_metadata=wvc.query.MetadataQuery(distance=True)
)
for obj in results.objects:
    print(f"Topic: {obj.properties['topic']}, Distance: {obj.metadata.distance}")
 
client.close()

pgvector

# Install extension (PostgreSQL 13+)
# Ubuntu/Debian
sudo apt install postgresql-16-pgvector
 
# Or build from source
cd /tmp && git clone https://github.com/pgvector/pgvector.git
cd pgvector && make && sudo make install
 
# Or use Docker
docker run -p 5432:5432 -e POSTGRES_PASSWORD=postgres ankane/pgvector
-- Enable extension
CREATE EXTENSION vector;
 
-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    source VARCHAR(64),
    topic VARCHAR(64),
    embedding vector(4)
);
 
-- Insert data
INSERT INTO documents (content, source, topic, embedding) VALUES
    ('doc about AI', 'wiki', 'AI', '[0.1, 0.2, 0.3, 0.4]'),
    ('doc about ML', 'arxiv', 'ML', '[0.5, 0.6, 0.7, 0.8]'),
    ('another AI doc', 'blog', 'AI', '[0.9, 0.1, 0.2, 0.3]');
 
-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 256);
 
-- Search with filter
SELECT id, content, topic,
       1 - (embedding <=> '[0.1, 0.2, 0.3, 0.4]') AS similarity
FROM documents
WHERE topic = 'AI'
ORDER BY embedding <=> '[0.1, 0.2, 0.3, 0.4]'
LIMIT 2;
# Python with psycopg2
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np
 
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres password=postgres")
register_vector(conn)
 
cur = conn.cursor()
query_vec = np.array([0.1, 0.2, 0.3, 0.4])
cur.execute("""
    SELECT id, content, topic, 1 - (embedding <=> %s) AS similarity
    FROM documents
    WHERE topic = 'AI'
    ORDER BY embedding <=> %s
    LIMIT 2
""", (query_vec, query_vec))
 
for row in cur.fetchall():
    print(f"ID: {row[0]}, Content: {row[1]}, Similarity: {row[3]:.4f}")

4. Indexing Algorithms

Vector databases rely on ANN (Approximate Nearest Neighbor) indexing algorithms to achieve fast search over millions or billions of vectors. Here are the main algorithms and their trade-offs.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each node is connected to its nearest neighbors. The search starts from the top layer (sparse) and descends to the bottom layer (dense), efficiently navigating to the target region.

Layer 2:    A ─────────────── D
            │                 │
Layer 1:    A ─── B ─── C ─── D ─── E
            │     │     │     │     │
Layer 0:    A ─ B ─ C ─ D ─ E ─ F ─ G ─ H
  • Key Parameters: M (max connections per node), efConstruction (build-time search width), ef (query-time search width)
  • Strengths: Excellent query performance, high recall, incremental inserts
  • Weaknesses: High memory usage (stores graph in RAM), slower build time

IVF (Inverted File Index)

IVF partitions the vector space into clusters using k-means. At query time, only the closest clusters are searched rather than the entire dataset.

┌─────────────────────────────────┐
│   Cluster 1    Cluster 2        │
│   ┌──────┐     ┌──────┐        │
│   │ •  • │     │  • • │        │
│   │ •• • │     │ •  • │        │
│   └──────┘     └──────┘        │
│        Cluster 3                │
│        ┌──────┐                 │
│        │ •• • │                 │
│        │  • • │                 │
│        └──────┘                 │
└─────────────────────────────────┘
Query: Search only nearest nprobe clusters
  • Key Parameters: nlist (number of clusters), nprobe (clusters to search at query time)
  • Strengths: Lower memory usage, fast build, works well with GPU
  • Weaknesses: Requires training step, lower recall at low nprobe values

PQ (Product Quantization)

PQ compresses vectors by dividing them into sub-vectors and quantizing each sub-vector independently. This dramatically reduces memory usage while maintaining reasonable accuracy.

Original Vector (128-dim):
[0.1, 0.2, ..., 0.5, 0.6, ..., 0.3, 0.4, ..., 0.7, 0.8, ...]
 └──── Sub 1 ────┘ └──── Sub 2 ────┘ └──── Sub 3 ────┘ └──── Sub 4 ────┘
        ↓                 ↓                 ↓                 ↓
    Code: 42          Code: 17          Code: 89          Code: 5
    (1 byte)          (1 byte)          (1 byte)          (1 byte)

Compressed: [42, 17, 89, 5] = 4 bytes (vs 512 bytes original)
  • Key Parameters: m (number of sub-quantizers), nbits (bits per code)
  • Strengths: Very low memory (32x--64x compression), fast distance computation
  • Weaknesses: Lower accuracy, requires training, best combined with IVF

Flat (Brute Force)

Flat index stores raw vectors and computes exact distances against every vector. No approximation involved.

  • Strengths: 100% recall (exact results), no training needed
  • Weaknesses: O(n) query time, impractical for large datasets

Algorithm Comparison

FeatureHNSWIVFPQFlat
Query SpeedVery FastFastFastSlow
Memory UsageHighMediumVery LowHigh
Build TimeSlowMedium (requires training)Slow (requires training)None
Recall @ top-1095--99%85--95%70--90%100%
Incremental InsertYesRequires rebuildingRequires rebuildingYes
Best Scale1M--100M10M--1B100M--10B< 100K
GPU AccelerationLimitedYesYesYes

Note: In practice, hybrid index strategies like IVF+PQ or IVF+HNSW are commonly used to balance memory, speed, and accuracy. Milvus supports the most index types, while pgvector and Chroma focus primarily on HNSW.

Index Support by Database

Index TypeChromaPineconeMilvusQdrantWeaviatepgvector
HNSWYesInternalYesYesYesYes
IVFFlat-InternalYes--Yes
IVF+PQ-InternalYes---
FlatYes-Yes-YesYes
DiskANN--Yes---
On-disk vectors--YesYes--

5. Search Capabilities

All vector databases support basic similarity search with configurable distance metrics. Here is a unified comparison of search capabilities:

CapabilityChromaPineconeMilvusQdrantWeaviatepgvector
CosineYesYesYesYesYesYes
L2 (Euclidean)YesYesYesYesYesYes
Inner Product-YesYesYesYesYes
Hamming--Yes--Yes

Metadata Filtering

Filtering results by metadata (payload) fields is critical for production applications. Each database offers different filtering capabilities:

# Qdrant -- Advanced filtering example
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
 
results = client.query_points(
    collection_name="documents",
    query=[0.1, 0.2, 0.3, 0.4],
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="topic", match=MatchValue(value="AI")),
            FieldCondition(key="year", range=Range(gte=2023)),
        ],
        must_not=[
            FieldCondition(key="status", match=MatchValue(value="draft"))
        ]
    )
)
# Milvus -- Boolean expression filtering
results = collection.search(
    data=[[0.1, 0.2, 0.3, 0.4]],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    expr='topic == "AI" and year >= 2023 and status != "draft"'
)
-- pgvector -- Full SQL WHERE clause power
SELECT id, content, 1 - (embedding <=> query_vec) AS similarity
FROM documents
WHERE topic = 'AI'
  AND year >= 2023
  AND status != 'draft'
  AND category IN ('research', 'tutorial')
ORDER BY embedding <=> query_vec
LIMIT 5;

Hybrid Search (Vector + Keyword)

Hybrid search combines dense vector similarity with sparse keyword matching (BM25) for improved retrieval quality. This is particularly effective when queries contain specific terms or acronyms.

# Weaviate -- Native hybrid search (BM25 + vector)
results = documents.query.hybrid(
    query="transformer architecture attention mechanism",
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector
    limit=5,
    return_metadata=wvc.query.MetadataQuery(score=True)
)
for obj in results.objects:
    print(f"Score: {obj.metadata.score:.4f}, Content: {obj.properties['content'][:80]}")
# Qdrant -- Sparse + Dense fusion
from qdrant_client.models import SparseVector
 
# Requires collection configured with both dense and sparse vectors
results = client.query_points(
    collection_name="documents",
    prefetch=[
        # Dense vector search
        {"query": [0.1, 0.2, 0.3, 0.4], "using": "dense", "limit": 20},
        # Sparse vector search (BM25 weights)
        {"query": SparseVector(indices=[1, 42, 100], values=[0.5, 0.8, 0.3]),
         "using": "sparse", "limit": 20},
    ],
    query={"fusion": "rrf"},  # Reciprocal Rank Fusion
    limit=5
)
-- pgvector + tsvector -- Hybrid search with PostgreSQL full-text search
SELECT id, content,
       ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'transformer attention')) AS text_score,
       1 - (embedding <=> query_vec) AS vector_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'transformer attention')
ORDER BY (0.5 * ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'transformer attention'))
        + 0.5 * (1 - (embedding <=> query_vec))) DESC
LIMIT 5;
Hybrid Search FeatureChromaPineconeMilvusQdrantWeaviatepgvector
Native BM25--Yes (2.4+)-YesVia tsvector
Sparse Vectors-YesYesYes--
Reciprocal Rank Fusion--YesYesYesManual
Configurable Weighting--YesYesYes (alpha)Manual

Some databases support storing and searching multiple vectors per record, useful for multi-modal data (text + image) or late-interaction retrieval models like ColBERT.

# Qdrant -- Named vectors (multiple vectors per point)
from qdrant_client.models import VectorParams, Distance
 
client.create_collection(
    collection_name="multimodal",
    vectors_config={
        "text": VectorParams(size=768, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.COSINE),
    }
)
 
# Search by text vector
results = client.query_points(
    collection_name="multimodal",
    query=[0.1] * 768,
    using="text",
    limit=5
)
# Milvus -- Multiple vector fields
fields = [
    FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, max_length=64),
    FieldSchema(name="text_vec", dtype=DataType.FLOAT_VECTOR, dim=768),
    FieldSchema(name="image_vec", dtype=DataType.FLOAT_VECTOR, dim=512),
]

6. Scalability and Performance

Benchmark Comparison

The following table shows approximate performance characteristics based on published benchmarks and community testing. Actual results vary significantly with hardware, data distribution, dimensionality, and configuration.

Note: These numbers are rough guidelines, not exact benchmarks. Always run your own tests with representative data and queries on your target hardware.

MetricChromaPineconeMilvusQdrantWeaviatepgvector
Insert Speed (vec/sec)~5K~10K~50K~30K~15K~8K
Query Latency (p99, 1M vectors)~15ms~10ms~5ms~5ms~10ms~20ms
Concurrent Queries (QPS)~500~1,000+~5,000+~3,000+~1,500~800
Max Vectors (practical)1M1B+10B+1B+100M+10M
Memory per 1M vectors (768d)~3.5 GBManaged~3 GB~2.5 GB~4 GB~3 GB

Horizontal Scaling

Scaling FeatureChromaPineconeMilvusQdrantWeaviatepgvector
Sharding-AutomaticYesYesYesManual (Citus)
Replication-Built-inYesYesYesPostgreSQL streaming
Auto-scaling-Yes (serverless)Via K8sVia K8sVia K8s-
Multi-region-YesManualManualYesManual
GPU Acceleration--Yes (Knowhere)---
Disk-based Vectors--Yes (DiskANN)Yes (mmap)-On disk by default

Scaling Architecture Patterns

For small-scale projects (under 1 million vectors), a single-node deployment of any database will suffice. As you scale beyond that, the architectural choices diverge:

Small Scale (< 1M vectors):
  Any database, single node

Medium Scale (1M - 100M vectors):
  Qdrant single node with on-disk vectors
  Milvus standalone
  pgvector with proper indexing

Large Scale (100M - 1B vectors):
  Milvus cluster (sharded)
  Qdrant distributed mode
  Weaviate cluster
  Pinecone (managed)

Very Large Scale (1B+ vectors):
  Milvus cluster with DiskANN
  Pinecone Enterprise

7. Enterprise Features

Authentication and Authorization

FeatureChromaPineconeMilvusQdrantWeaviatepgvector
API Key AuthYesYesYesYesYesN/A (PostgreSQL)
RBAC-YesYes (2.3+)-YesPostgreSQL RBAC
TLS/SSLYesYesYesYesYesPostgreSQL SSL
SSO/OIDC-Yes--YesVia proxy
Multi-tenancyCollection-levelNamespaceDatabase/PartitionCollection/PayloadTenant APISchema/RLS

Backup and Restore

FeatureChromaPineconeMilvusQdrantWeaviatepgvector
Snapshot/BackupFile copyManagedmilvus-backup toolSnapshot APIBackup APIpg_dump
Point-in-time Recovery-----PostgreSQL WAL
Cross-region Backup-YesVia S3ManualYesVia streaming replication

Monitoring and Observability

# Milvus -- Prometheus metrics endpoint
curl http://localhost:9091/metrics
 
# Qdrant -- Built-in metrics
curl http://localhost:6333/metrics
 
# Weaviate -- Prometheus metrics
curl http://localhost:2112/metrics
FeatureChromaPineconeMilvusQdrantWeaviatepgvector
Prometheus Metrics-DashboardYesYesYespg_stat extensions
Grafana Dashboards--OfficialCommunityOfficialPostgreSQL dashboards
Distributed Tracing--Jaeger---
Query LoggingBasicDashboardYesYesYesPostgreSQL pg_stat_statements

Cloud Deployment Options

DatabaseManaged ServiceKubernetes HelmTerraformMajor Cloud Marketplace
Chroma-Community--
PineconePinecone.ioN/A (SaaS)YesAWS
MilvusZilliz CloudOfficialYesAWS, GCP, Azure
QdrantQdrant CloudOfficialYesAWS, GCP, Azure
WeaviateWeaviate CloudOfficialYesAWS, GCP
pgvectorRDS, Cloud SQL, SupabaseVia PostgreSQL chartsYesAll major clouds

8. Integration with AI Frameworks

LangChain Integration

LangChain provides unified interfaces for all major vector databases:

from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
 
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 
docs = [
    Document(page_content="RAG improves LLM accuracy with retrieval", metadata={"topic": "RAG"}),
    Document(page_content="Vector databases store embeddings efficiently", metadata={"topic": "VectorDB"}),
    Document(page_content="HNSW enables fast approximate nearest neighbor search", metadata={"topic": "Indexing"}),
]
# Chroma
from langchain_chroma import Chroma
 
vectorstore = Chroma.from_documents(docs, embeddings, collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
 
# Pinecone
from langchain_pinecone import PineconeVectorStore
 
vectorstore = PineconeVectorStore.from_documents(docs, embeddings, index_name="langchain-demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
 
# Milvus
from langchain_milvus import Milvus
 
vectorstore = Milvus.from_documents(docs, embeddings, collection_name="langchain_demo",
                                     connection_args={"host": "localhost", "port": "19530"})
results = vectorstore.similarity_search("How does RAG work?", k=2)
 
# Qdrant
from langchain_qdrant import QdrantVectorStore
 
vectorstore = QdrantVectorStore.from_documents(docs, embeddings,
                                                url="http://localhost:6333",
                                                collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)
 
# Weaviate
from langchain_weaviate import WeaviateVectorStore
import weaviate
 
weaviate_client = weaviate.connect_to_local()
vectorstore = WeaviateVectorStore.from_documents(docs, embeddings, client=weaviate_client)
results = vectorstore.similarity_search("How does RAG work?", k=2)
 
# pgvector
from langchain_postgres import PGVector
 
vectorstore = PGVector.from_documents(docs, embeddings,
                                       connection="postgresql://user:pass@localhost/db",
                                       collection_name="langchain_demo")
results = vectorstore.similarity_search("How does RAG work?", k=2)

LlamaIndex Integration

from llama_index.core import VectorStoreIndex, Document
from llama_index.embeddings.openai import OpenAIEmbedding
 
documents = [Document(text="RAG combines retrieval with generation for better AI responses.")]
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Chroma
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
 
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("llama_demo")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")
# Qdrant
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
 
qdrant_client = QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="llama_demo")
 
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")
# Milvus
from llama_index.vector_stores.milvus import MilvusVectorStore
 
vector_store = MilvusVectorStore(uri="http://localhost:19530", collection_name="llama_demo", dim=1536)
 
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store, embed_model=embed_model)
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")

Haystack Integration

# Qdrant with Haystack
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack import Pipeline, Document
 
# Set up document store
document_store = QdrantDocumentStore(
    url="http://localhost:6333",
    index="haystack_demo",
    embedding_dim=1536,
    recreate_index=True
)
 
# Index documents
doc_embedder = OpenAIDocumentEmbedder(model="text-embedding-3-small")
docs = [Document(content="RAG improves LLM accuracy with external knowledge retrieval.")]
docs_with_embeddings = doc_embedder.run(documents=docs)
document_store.write_documents(docs_with_embeddings["documents"])
 
# Build retrieval pipeline
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OpenAITextEmbedder(model="text-embedding-3-small"))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(document_store=document_store, top_k=3))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
 
results = query_pipeline.run({"text_embedder": {"text": "What is RAG?"}})
for doc in results["retriever"]["documents"]:
    print(f"Score: {doc.score:.4f}, Content: {doc.content[:80]}")

9. Selection Guide

Decision Flowchart

Use the following text-based decision tree to narrow down your choice:

START: What is your primary requirement?
│
├─ "Quick prototype / experimentation"
│   └─→ Chroma (zero config, pip install, runs in-process)
│
├─ "Already using PostgreSQL"
│   └─ How many vectors?
│       ├─ < 5M  → pgvector (no new infra, full SQL power)
│       └─ > 5M  → Consider dedicated vector DB
│
├─ "Zero operational overhead / managed service"
│   └─→ Pinecone (fully managed, auto-scaling)
│
├─ "Massive scale (billions of vectors)"
│   └─→ Milvus (distributed architecture, GPU support, DiskANN)
│
├─ "High-performance with complex filtering"
│   └─→ Qdrant (Rust performance, payload indexes, on-disk vectors)
│
└─ "Hybrid search (keyword + vector) is critical"
    └─→ Weaviate (native BM25 + vector, built-in vectorization modules)

Scenario-Based Recommendations

ScenarioRecommended DBReason
Prototyping / hackathonChromaZero setup, pip install, runs in Jupyter
Existing PostgreSQL infrastructurepgvectorNo new infra, SQL joins with relational data, ACID
Managed service with minimal opsPineconeFully managed SaaS, auto-scaling, built-in monitoring
Large-scale production (100M+ vectors)MilvusDistributed architecture, GPU acceleration, rich index types
High-performance filtering workloadsQdrantRust-based engine, efficient payload indexing, low memory
Keyword + vector hybrid searchWeaviateNative BM25 + vector fusion, integrated vectorization
Multi-modal search (text + image)Qdrant or MilvusNamed vectors / multi-vector field support
Cost-sensitive startuppgvector or ChromaOpen source, no additional infrastructure costs
Enterprise with compliance requirementsPinecone or MilvusRBAC, SOC2, enterprise support contracts
Edge / mobile deploymentChromaLightweight embedded mode, minimal dependencies

Cost Comparison Overview

DatabaseSelf-hosted CostManaged Service Starting PriceFree Tier
ChromaInfrastructure onlyN/AOpen source
PineconeN/A (SaaS only)Serverless: pay-per-use2GB storage
MilvusInfrastructure onlyZilliz: from ~$65/monthFree trial
QdrantInfrastructure onlyQdrant Cloud: from ~$25/month1GB free cluster
WeaviateInfrastructure onlyWeaviate Cloud: from ~$25/monthSandbox available
pgvectorPostgreSQL costsRDS/Supabase pricingSupabase free tier

Migration Considerations

When choosing a vector database, consider these factors for long-term success:

  1. Data portability -- Can you export and import vectors easily? Open-source databases generally offer better portability than proprietary ones.
  2. API stability -- How mature is the SDK? Frequent breaking changes increase maintenance burden.
  3. Community and ecosystem -- Larger communities mean more tutorials, integrations, and faster bug fixes.
  4. Operational complexity -- How much DevOps effort does the database require in production?
  5. Lock-in risk -- Can you switch databases without rewriting your entire application? Using framework abstractions (LangChain, LlamaIndex) mitigates this.

References


-- Data Dynamics Engineering Team