Data/Model Catalog
A unified catalog that governs data and AI models together, giving every team a single source of truth they can trust.
Concept Diagram

Key strengths
Unified data + model governance
Data catalog and ML model registry in one. Manage data assets and AI models in a single catalog and build a true single source of truth (SSOT) for the entire organization.
Auto-sync from 10 data sources
Automatically harvest metadata from Hive, Impala, Kudu, Trino, StarRocks, Greenplum, PostgreSQL, MySQL, Oracle, and MSSQL — keeping schemas, stats, and lineage up to date.
Column-level cross-platform lineage
Automatic end-to-end lineage tracking at dataset and column level via SQL parsing (sqlglot), supporting query-based, pipeline-based, and manual lineage.
MLflow Unity Catalog compatible
Full compatibility with the MLflow Unity Catalog API — use your existing MLflow workflows without changes. Comes with an OCI-based model store on S3/MinIO.
Platform architecture
An end-to-end catalog platform where Catalog UI, Server, Extensions, and SDK work together seamlessly.
Key features
Everything an enterprise catalog needs — from data governance and ML model management to air-gapped deployment support.
Multi-platform data catalog
URN-based dataset identity, schema change history tracking via snapshots, and unified tag, glossary, and ownership management in a single catalog.
Cross-platform lineage
End-to-end lineage tracking at both dataset and column level, supporting query-based, pipeline-based, and manual lineage sources.
ML model registry
MLflow Unity Catalog API-compatible model registry with OCI-based artifact storage on S3/MinIO and version and stage management.
Data quality engine
Define and execute quality rules — NOT_NULL, UNIQUE, MIN/MAX, REGEX, FRESHNESS, CUSTOM_SQL — and compute aggregated quality scores.
Data standards management
Manage standard dictionaries, domains, terms (with morpheme analysis), code groups and values, term-to-column mapping, and change audit logs.
Semantic & hybrid search
Combine pgvector embeddings with keyword search for natural-language discovery of datasets and models.
Source code analysis
Automatically discover table and column access patterns from Java (JPA, MyBatis, JDBC) and Python (SQLAlchemy, Django) source code.
Airgap model transfer
Online pull → USB transfer → offline import workflow designed for air-gapped, secure environments.