Solutions

Data/Model Catalog

A unified catalog that governs data and AI models together, giving every team a single source of truth they can trust.

Concept Diagram

Argus Catalog Platform Architecture

Key strengths

01

Unified data + model governance

Data catalog and ML model registry in one. Manage data assets and AI models in a single catalog and build a true single source of truth (SSOT) for the entire organization.

02

Auto-sync from 10 data sources

Automatically harvest metadata from Hive, Impala, Kudu, Trino, StarRocks, Greenplum, PostgreSQL, MySQL, Oracle, and MSSQL — keeping schemas, stats, and lineage up to date.

03

Column-level cross-platform lineage

Automatic end-to-end lineage tracking at dataset and column level via SQL parsing (sqlglot), supporting query-based, pipeline-based, and manual lineage.

04

MLflow Unity Catalog compatible

Full compatibility with the MLflow Unity Catalog API — use your existing MLflow workflows without changes. Comes with an OCI-based model store on S3/MinIO.

Platform architecture

An end-to-end catalog platform where Catalog UI, Server, Extensions, and SDK work together seamlessly.

Catalog UI
Next.js · React
Dataset exploration & management
Lineage graph visualization
Model registry dashboard
Quality dashboard
Standards dictionary management
Semantic search
Catalog Server
FastAPI · PostgreSQL
REST API (v1)
pgvector semantic search
S3/MinIO model store
MLflow Unity Catalog compatible
Data quality engine
Standards & glossary management
Extensions
Sync · Plugins · Analyzer
Metadata Sync (10 DBs)
Impala Query Agent
Trino Query Listener
StarRocks Audit Plugin
Source code analysis (Java/Python)
sqlglot Impala extension
SDK & CLI
Python SDK
argus-model CLI
OCI-based model push/pull
HuggingFace import
Airgap transfer workflow
Presigned URL uploads
Manifest management
Supported data sources
HiveImpalaKuduTrinoStarRocksGreenplumPostgreSQLMySQLOracleMSSQL

Key features

Everything an enterprise catalog needs — from data governance and ML model management to air-gapped deployment support.

Multi-platform data catalog

URN-based dataset identity, schema change history tracking via snapshots, and unified tag, glossary, and ownership management in a single catalog.

Cross-platform lineage

End-to-end lineage tracking at both dataset and column level, supporting query-based, pipeline-based, and manual lineage sources.

ML model registry

MLflow Unity Catalog API-compatible model registry with OCI-based artifact storage on S3/MinIO and version and stage management.

Data quality engine

Define and execute quality rules — NOT_NULL, UNIQUE, MIN/MAX, REGEX, FRESHNESS, CUSTOM_SQL — and compute aggregated quality scores.

Data standards management

Manage standard dictionaries, domains, terms (with morpheme analysis), code groups and values, term-to-column mapping, and change audit logs.

Semantic & hybrid search

Combine pgvector embeddings with keyword search for natural-language discovery of datasets and models.

Source code analysis

Automatically discover table and column access patterns from Java (JPA, MyBatis, JDBC) and Python (SQLAlchemy, Django) source code.

Airgap model transfer

Online pull → USB transfer → offline import workflow designed for air-gapped, secure environments.