Solutions · Open Source

Argus RAG Studio

An open-source, self-hosted platform that covers the full RAG lifecycle — Build, Retrieve & Generate, Evaluate, and Operate & Deploy — in one place. Not a "works-once RAG demo": it ships with an evaluation harness that measures quality in numbers, config sweeps that auto-explore optimal settings, a feedback loop, and agent-based remote deployment, so it runs on-premises and in air-gapped (closed) networks.

Apache License 2.0 · Open SourceGitHub Repository Product Brochure

Highlights

Measure → optimize → improve loop

Quality is measured in numbers with golden sets, Hit Rate/MRR, and 3-axis LLM-as-Judge; config sweeps auto-explore the best combination of chunking, search mode, and reranker; and 👍/👎 user feedback is promoted back into the golden set — the loop most self-built RAG stacks never get to is a built-in feature.

Hybrid search + cited answers

Vector (pgvector) and lexical (tsvector) results are fused with RRF and reordered by reranking (LLM or cross-encoder), and answers stream over SSE with [n] citations. Federated queries search knowledge bases with heterogeneous embeddings at once.

Built for Korean documents

A dedicated Rust parser for HWP/HWPX (rhwp), kss Korean sentence splitting, a VLM/OCR (PaddleOCR) pipeline for scanned documents, and AI-Hub-compatible annotation — an axis general open-source RAG frameworks do not cover.

Air-gap · agent-based remote deployment

Models are brought in as packs and auto-installed at deploy time, while per-host agents remotely deploy workers, embedding, reranker, and VLM servers. With the zot registry even containers stay fully offline — meeting network-separation requirements in finance, government, and defense.

Platform Architecture

The frontend dashboard, RAG backend, inference servers, and data stores/registry work together, and inference and workers can be deployed separately via agents to scale in stages.

Frontend Dashboard

Next.js 16 · React 19

Knowledge bases · Playground · Chat

Pipelines · evaluation · observability

Feedback · document routing · fine-tuning

Annotation · image explorer

Model & server management · source watch

Jobs · users/permissions · API keys · PII rules

RAG Backend

FastAPI :4700

Ingestion — parse·chunk·embed·index (async workers)

Query — hybrid search · rerank · generate

Evaluation · traces · feedback · pipeline versions

RAG document routing · source watch

servermgr — agent deploy · proxy · heartbeat

REST · SSE streaming · local JWT/Keycloak

Inference

Local or separately deployed

Embedding :8080 — FastEmbed local · OpenAI-compatible

Reranker :8081 — cross-encoder

Detection (OCR) :8082 — PaddleOCR · EasyOCR

Generation LLM — Claude · OpenAI-compatible · Ollama · vLLM

VLM (vLLM) — scanned docs & image parsing

GPU variants — cpu · gpu(onnx) · gpu-torch

Data Stores

PostgreSQL · MinIO · zot

PostgreSQL + pgvector — chunks·vectors·tsvector

Traces · evaluation · feedback in the same plane

MinIO / S3 — source docs · images · model packs

Model Repository (argus-models)

zot OCI registry — air-gapped images

buildx bake — amd64+arm64 multi-arch

Tech Stack

Python 3.11+FastAPI (async)SQLAlchemy 2.0Pydantic v2PostgreSQL + pgvectorMinIO / S3Next.js 16React 19TypeScriptTailwind 4 · shadcn/uiFastEmbed(ONNX) · torch(cu128)Docker/Podman · zot · buildxJWT · Keycloak OIDC · API keys

Core Capabilities

From ingestion, parsing, and chunking to hybrid search & generation, evaluation, config sweeps, retrieval fine-tuning, versions & observability, agent deployment & air-gap, and annotation & images — twelve pillars covering the full RAG lifecycle in a single platform.

Ingestion pipeline

Multi-format documents are processed by async workers through upload → parse → chunk → embed → index.

Multi-format loaders — txt/pdf/docx/xlsx/pptx/hwp/hwpx and more

Source watch — periodic drop-zone scans · unattended intake

content_hash idempotency · reprocessing (reindex)

Job progress tracking · workers deployable on separate hosts

5 parse strategies

Pick the parse strategy per collection to match the document (auto-fallback when uninstalled).

text · layout (pdfplumber) · docai (docling)

vlm — vision LLM (scans · complex layouts)

rhwp — dedicated Rust parser for HWP/HWPX (preserves merged tables)

Availability introspection · validated on real models

8 chunking strategies

Chunking decides half of retrieval quality — implemented down to table preservation and meaning boundaries.

recursive · fixed · sentence (Korean kss) · paragraph · section

markdown (preserves tables/code blocks · heading breadcrumbs) · semantic · auto

char / token (tiktoken) units · smart overlap

Quality guards — small-chunk merging · chunk budget caps

Knowledge base design — fail-closed isolation

Collections are designed as security isolation boundaries, not just topic buckets.

Every query physically filtered by collection_id (fail-closed)

Embedding model · dimension · distance metric frozen — vector-space integrity

Deterministic document routing — priority · first-match-wins

Uncertain security grade → assigned to the highest grade

Hybrid search & generation

Search meaning and keywords in parallel, fuse the results, and generate cited answers.

Vector (pgvector) + lexical (tsvector) + RRF fusion

Reranking none / llm / cross_encoder

[n] grounded cited answers · multi-turn chat (SSE)

Federated queries — RRF merge across heterogeneous-embedding collections

Model flexibility

Swap embedding, reranker, generation LLM, VLM, and OCR per workload.

Embedding — local (FastEmbed) · OpenAI-compatible (TEI/vLLM/Ollama) · default bge-m3

Generation LLM — Claude · OpenAI-compatible · Ollama · vLLM

VLM (vLLM) · OCR detection (PaddleOCR/EasyOCR)

Per-collection model · dimension · distance · auto dimension detection

Evaluation harness

Measure quality in numbers with golden datasets and an LLM judge.

Golden-set (question · answer docs) management · feedback promotion

Retrieval metrics — Hit Rate · MRR

Generation metrics — 3-axis LLM-as-Judge (Faithfulness·Relevance·Correctness)

Holdout · overfitting flags · judge gating

Config sweeps & improvement loop

Auto-explore combinations of chunking, search mode, top-k, and reranker, compared on a leaderboard.

Sweeps across query axes + index axes (temporary collections)

Leaderboard — sorted by Hit Rate · MRR · judge scores

Promote the winning config as a new pipeline version · rollback

Traces → 👍/👎 feedback → golden-set promotion loop

Retrieval fine-tuning

Tune embeddings and rerankers to your domain terms and acronyms.

Glossary → synthetic query generation · labeling UI review

(query · positive · negative) triplet training datasets

JSONL export · external trainer (M2M callback)

Pipeline versions & observability

Treat search, rerank, and generation settings as versionable assets, and instrument every query.

Append-only versions · stages · rollback · field-level diff

Evaluation linked per version — block regressions upfront

Query Trace — per-stage latency · token capture

Statistics — success rate · p50/p95 · top queries · API keys (M2M)

Agent-based deployment & air-gap

Per-host Argus Agents deploy workers and inference servers; closed networks import models as packs.

servermgr — agent registration · remote deploy · proxy · heartbeat

Automatic GPU variant selection — amd64 gpu(onnx) · arm64 gpu-torch

Model pack import · Model Repository auto-install · offline serving

zot OCI registry · buildx multi-arch images

Annotation & image pipeline

Turn in-document images and scans into knowledge via OCR and VLM.

Image OCR labeling — AI-Hub JSON compatible

Detection server proposes draft labels (PaddleOCR/EasyOCR)

Image explorer · VLM content analysis indexing

HWP preview — Chromium rendering (@rhwp/core)

Apache License 2.0 · Open Source

An open-source RAG platform

Argus RAG Studio is published on GitHub under the Apache License 2.0. The entire RAG engine — backend (FastAPI), frontend (Next.js), and the standalone embedding/reranker servers — is open, so enterprises can verify the code directly, extend it to fit their environment, and operate it without sending data outside.

Apache 2.0 with no commercial-use restrictions
Verify and extend the code yourself
Self-host in air-gapped / on-premises

GitHub Repository Contact us