Solutions · Open Source

Argus RAG Studio

An open-source platform for building, operating, evaluating, and serving RAG (Retrieval-Augmented Generation) pipelines in one place. It covers the entire RAG lifecycle — from document ingestion to hybrid search, citation-grounded answer generation, and evaluation, observability, and feedback — and can run embedding and reranking locally inside the backend, so it operates even in air-gapped and on-premises environments.

Apache License 2.0 · Open SourceGitHub Repository

Highlights

01

End-to-end indexing & query pipeline

Ingestion (upload → parse → chunk → embed → index) and query (search → rerank → generate) run in a single backend. Each collection (knowledge base) can be configured with different strategies.

02

Hybrid search + cited answers

Vector (pgvector) and lexical (tsvector) search are fused with RRF, and answers are generated with [n] grounding citations. Multi-turn chat streams over SSE.

03

Local inference · air-gapped operation

Embedding and reranking can run locally inside the backend via FastEmbed, so an external inference server is optional. The generation LLM is a BYO design that switches between an OpenAI-compatible server and Claude, so it runs in closed networks too.

04

Closed evaluation–ops–feedback loop

Golden-set automated evaluation (Hit Rate, MRR, LLM-as-judge), per-stage latency and token tracking, and promotion of 👍/👎 answer feedback into the golden set — a loop that measures and improves quality.

Platform Architecture

An end-to-end RAG platform where the frontend dashboard, RAG backend, inference, and data stores work organically together.

Frontend Dashboard
Next.js 16 · React 19
Knowledge base & document management
Playground · Chat
Pipeline editor & versions
Evaluation datasets & runs
Observability (traces) · statistics
Feedback · users/permissions
RAG Backend
FastAPI :4700
Ingestion (parse · chunk · embed · index)
Query (search · rerank · generate)
Evaluation · observability · feedback
Pipeline version management
Local / Keycloak authentication
REST · SSE streaming
Inference
Local or standalone server
Embedding — FastEmbed local · server :8080
Reranker — FastEmbed local · server :8081
Generation LLM — OpenAI-compatible · Claude
Vision LLM (BYO) — for vlm parsing
Data Stores
PostgreSQL · MinIO
PostgreSQL + pgvector
Chunks · vectors · tsvector · metadata
Query traces · evaluation · feedback
MinIO / S3 source documents
Tech Stack
Python 3.11FastAPISQLAlchemy 2.0PostgreSQL + pgvectorPydantic v2Next.js 16React 19TypeScriptTailwind 4 · shadcn/uiJWT · Keycloak OIDCFastEmbed (ONNX)Anthropic SDK

Core Capabilities

From ingestion, parsing, and chunking to hybrid search & generation and evaluation, observability, and feedback — the entire RAG pipeline in a single platform.

Ingestion

Uploaded documents are processed asynchronously through parse → chunk → embed → pgvector indexing.

Loaders for txt/md/csv/json/html/xml/pdf/docx/xlsx/pptx/hwp/hwpx
Metadata extraction for HWP·HWPX·PDF·DOCX·XLSX
content_hash idempotency · reprocessing (reindex)
Async workers · job progress tracking

Parse strategies

Swap the parsing stage per collection (re-indexing on change).

text · layout (tables → Markdown) · docai (docling)
vlm — external vision LLM (BYO)
rhwp — preserves merged HWP/HWPX tables
Availability introspection · validated on real models

Chunking strategies

Swap the chunking method and unit per collection.

recursive · sentence (Korean kss) · fixed
markdown (tables·headings) · semantic (meaning boundaries)
char / token (tiktoken) unit · size · overlap
Smart overlap trimming · quality guards

Hybrid search & generation

Search by combining keywords and meaning, and generate cited answers.

Vector (pgvector) + lexical (tsvector) + RRF fusion
[n] grounding cited answer generation
Multi-turn chat — SSE token & source streaming
Reranking none/llm/cross_encoder

Embedding & inference providers

Switch embedding, reranking, and the generation LLM between local and standalone servers.

Embedding — OpenAI-compatible · local (FastEmbed) · hash
Per-collection model · dimension · server URL · cache reuse
Standalone embedding (:8080) · reranker (:8081) servers
Generation LLM — OpenAI-compatible · Claude (anthropic SDK)

Evaluation

Automatically measure RAG pipeline quality with golden datasets.

Golden-set (question · expected answer · expected sources) CRUD
Retrieval metrics — Hit Rate · MRR (no LLM needed)
Generation metrics — LLM-as-judge (faithfulness, etc.)
Async evaluation runs · metric tables

Observability

Instrument per-stage latency and token usage of queries.

Query Trace — retrieval/rerank/generation latency
Token usage capture (OpenAI-compatible · Claude)
Statistics — success rate · latency p50/p95 · top queries
Best-effort instrumentation (never blocks requests)

Pipeline version management

Manage search, rerank, and generation settings as versionable first-class assets.

Append-only versions · rollback · field-level diff
Compare two versions on the same query (experiment)
Distance metric override
Apply pipeline_id to search/query/chat

Feedback loop

Collect answer ratings and feed them back into the golden set.

👍/👎 widget on Playground·Chat answers
Attributed to a specific answer via trace_id
Promote feedback into golden-set items
Evaluation/status filters · statistics · admin screen
Apache License 2.0 · Open Source

An open-source RAG platform

Argus RAG Studio is published on GitHub under the Apache License 2.0. The entire RAG engine — backend (FastAPI), frontend (Next.js), and the standalone embedding/reranker servers — is open, so enterprises can verify the code directly, extend it to fit their environment, and operate it without sending data outside.

  • Apache 2.0 with no commercial-use restrictions
  • Verify and extend the code yourself
  • Self-host in air-gapped / on-premises