Argus RAG Studio
An open-source platform for building, operating, evaluating, and serving RAG (Retrieval-Augmented Generation) pipelines in one place. It covers the entire RAG lifecycle — from document ingestion to hybrid search, citation-grounded answer generation, and evaluation, observability, and feedback — and can run embedding and reranking locally inside the backend, so it operates even in air-gapped and on-premises environments.
Highlights
End-to-end indexing & query pipeline
Ingestion (upload → parse → chunk → embed → index) and query (search → rerank → generate) run in a single backend. Each collection (knowledge base) can be configured with different strategies.
Hybrid search + cited answers
Vector (pgvector) and lexical (tsvector) search are fused with RRF, and answers are generated with [n] grounding citations. Multi-turn chat streams over SSE.
Local inference · air-gapped operation
Embedding and reranking can run locally inside the backend via FastEmbed, so an external inference server is optional. The generation LLM is a BYO design that switches between an OpenAI-compatible server and Claude, so it runs in closed networks too.
Closed evaluation–ops–feedback loop
Golden-set automated evaluation (Hit Rate, MRR, LLM-as-judge), per-stage latency and token tracking, and promotion of 👍/👎 answer feedback into the golden set — a loop that measures and improves quality.
Platform Architecture
An end-to-end RAG platform where the frontend dashboard, RAG backend, inference, and data stores work organically together.
Core Capabilities
From ingestion, parsing, and chunking to hybrid search & generation and evaluation, observability, and feedback — the entire RAG pipeline in a single platform.
Ingestion
Uploaded documents are processed asynchronously through parse → chunk → embed → pgvector indexing.
Parse strategies
Swap the parsing stage per collection (re-indexing on change).
Chunking strategies
Swap the chunking method and unit per collection.
Hybrid search & generation
Search by combining keywords and meaning, and generate cited answers.
Embedding & inference providers
Switch embedding, reranking, and the generation LLM between local and standalone servers.
Evaluation
Automatically measure RAG pipeline quality with golden datasets.
Observability
Instrument per-stage latency and token usage of queries.
Pipeline version management
Manage search, rerank, and generation settings as versionable first-class assets.
Feedback loop
Collect answer ratings and feed them back into the golden set.
An open-source RAG platform
Argus RAG Studio is published on GitHub under the Apache License 2.0. The entire RAG engine — backend (FastAPI), frontend (Next.js), and the standalone embedding/reranker servers — is open, so enterprises can verify the code directly, extend it to fit their environment, and operate it without sending data outside.
- Apache 2.0 with no commercial-use restrictions
- Verify and extend the code yourself
- Self-host in air-gapped / on-premises