Solutions · Open Source

Argus Catalog

An integrated AI·Data·API metadata platform that governs data, models, APIs, and AI agents in a single catalog. With strong support for air-gapped and on-premises environments, it secures enterprise-wide data sovereignty without ever sending data outside.

Apache License 2.0 · Open SourceGitHub Repository

Concept Diagram

Argus Catalog Platform Architecture

Highlights

01

Unified governance of data, models, APIs & AI

Brings the data catalog, ML model registry, API catalog, and AI Agent catalog together to deliver an enterprise-wide single source of truth (SSOT).

02

Auto-sync across 11 data sources

Automatically collects metadata from Hive, Impala, Kudu, Trino, StarRocks, Greenplum, Iceberg REST, PostgreSQL, MySQL, Oracle, and MSSQL, keeping schemas, statistics, and lineage up to date.

03

Column-level cross-platform lineage

Automatically traces end-to-end lineage at the dataset and column level via SQL parsing, and generates ER diagrams from DDL parsing.

04

Air-gapped / on-prem + local LLMs

Integrates with OpenAI and Anthropic as well as local LLMs such as Ollama, enabling full AI governance even in closed networks where data never leaves.

Platform Architecture

An end-to-end metadata platform where Catalog UI, Server, Extensions, and SDK work organically together.

Catalog UI
Next.js · React
Dataset discovery & management
Lineage & ERD visualization
Model registry dashboard
Quality dashboard
API & AI Agent catalog
Semantic search & AI assistant
Catalog Server
FastAPI · PostgreSQL
REST API (v1)
pgvector hybrid search
S3/MinIO model store
MLflow & OCI compatible
Data quality engine
AI metadata generation
Extensions
Sync · Plugins · Analyzer
Metadata Sync (11 sources)
Impala Query Agent
Trino Query Listener
StarRocks Audit Plugin
Source code analysis (Java/Python)
LDAP user sync
SDK & CLI
Python SDK
argus-model CLI
OCI-based model Push/Pull
HuggingFace import
Air-gapped transfer workflow
Presigned URL upload
Manifest management
Supported Data Sources (11)
HiveImpalaKuduTrinoStarRocksGreenplumIceberg RESTPostgreSQLMySQLOracleMSSQL

Core Capabilities

From data catalog to quality & governance, ML model registry, and AI — the five pillars of enterprise metadata management in a single platform.

Data Catalog

The core for discovering, trusting, and governing datasets.

URN-based dataset registration, search, tags & ownership
Column-level lineage & DDL-based ERD
Data standards dictionary & glossary (morphological analysis)
pgvector keyword + semantic hybrid search

Data Quality

Profiles source databases directly and validates with rules.

Profiling (incl. mode) & 10 validation rule types
CUSTOM_SQL / CUSTOM_PYTHON user-defined rules
Auto-synced quality scores (GOOD/WARN/BAD) & trends
Upstream quality-propagation warnings via lineage

Metadata Governance

Catalogs not just data but APIs and AI agents too.

API catalog — OpenAPI spec registration, version diff & lint
AI Agent catalog — tools/MCP, evaluation & metering
URN-based unified metadata management
Schema-change impact analysis & webhook alerts

ML Model Registry

MLflow/OCI-compatible model governance with air-gapped import.

MLflow integration & version/stage management (STAGING/PRODUCTION)
Metric comparison & model cards
OCI model hub (HuggingFace-style browser)
argus-model CLI & air-gapped import

AI

Auto-generates metadata with LLMs and queries the catalog.

AI metadata generation (descriptions, tags, PII detection; approval-based)
Tool-use AI assistant (catalog/schema/quality/lineage tools)
Answers grounded in real data
OpenAI, Anthropic & Ollama (local LLM) integration
Apache License 2.0 · Open Source

An open-source metadata platform

Argus Catalog is fully open-sourced on GitHub under the Apache License 2.0. Apart from the metadata ingestion connectors, the entire core engine — backend, frontend, SDK, AI agent, and quality batch — is public, so enterprises can verify the code directly, extend it to fit their environment, and operate it without any external data leakage.

  • Apache 2.0 with no commercial-use restrictions
  • Verify and extend the code yourself
  • Self-host in air-gapped / on-premises