iceberglakehousewhitepaperdata-platformtable-format

Apache Iceberg Whitepaper — Structure and Adoption Strategy for the Next-Generation Lakehouse Table Format

A whitepaper for data architects and platform leaders covering Apache Iceberg's metadata structure, operating model, multi-engine compatibility, and adoption strategy. Spans comparisons with Hive, Delta Lake, and Hudi; migration patterns; operational automation; and the 2026 outlook.

Data DynamicsMay 20, 202630 min read

Scope of this whitepaper

This paper covers Apache Iceberg from two angles: "why adopt it" and "how to operate it." Aimed at data architects, platform leads, and CDOs choosing and adopting table formats in multi-engine Lakehouse environments, it goes beyond the spec to cover operational automation, catalog topology, and migration patterns from a practitioner perspective.

1. Executive Summary

1.1 One-line summary

Apache Iceberg is an open table format that brings database-level transactions and evolution semantics on top of object storage, and as of 2026 it has become the de facto standard for multi-engine Lakehouses.

1.2 Key conclusions

As the Lakehouse has moved from "an ACID layer for a single engine" to "shared tables for many engines," the value of Iceberg's engine-neutral specification and the REST Catalog standard has come into sharp focus.
Iceberg is not a "storage format" but a "metadata structure." It is a specification that adds snapshot-based isolation, hidden partitioning, schema/partition evolution, time travel, and branches/tags on top of file formats like Parquet, ORC, and Avro.
In a single-engine environment (e.g., Databricks-only), Delta Lake is still a reasonable choice, but for workloads with two or more engines in play, or where long-term retention, legal correction, and experiment isolation matter, Iceberg has a clear advantage.
Operating costs are not negligible. Without automating compaction, snapshot expiration, orphan file cleanup, and catalog operations, Iceberg quickly turns into a "metadata swamp." This paper presents the standard operational-automation patterns alongside the spec.
With Iceberg V3 and the REST Catalog standard consolidating, the catalog layer will become the true control plane of the Lakehouse over the next three years.

1.3 Who benefits, and from which decisions

Reader	What this paper provides
Data platform architects	Metadata layering, catalog topology, engine compatibility matrix
Data engineering leads	Operational automation patterns, compaction and cleanup standards, monitoring metrics
CDO / data executives	Adoption decision tree, migration risk and timeline estimates, cost structure
ML / analytics leads	Experimentation and reproducibility patterns using time travel and branches/tags

2. Background: Why Yet Another Table Format

2.1 Structural limits of the Hive era

Through the mid-2010s, the de facto standard across the Hadoop and Spark ecosystem was Hive Metastore (HMS) + directory-based partitioning. It was simple, but its structural limits accumulated as scale grew.

Directory = partition = query predicate. Partition pruning only worked when a predicate like WHERE event_date = '2026-05-20' matched the directory path (/event_date=2026-05-20/) exactly. Once a column was renamed or a user wrote an expression like WHERE ts > ..., pruning collapsed.
No partition evolution. A decision like "switch from daily to hourly partitions" effectively meant rewriting the table. For a petabyte-scale table in production, that was nearly impossible.
Schema evolution by external convention. Hive supported column add/rename to some degree, but position-based mapping without column IDs made column reordering and rename risky.
No atomicity. Directory rename was used to fake "atomic publish," but on object stores like S3, rename is implemented as non-atomic copying — yielding partial visibility and duplication.
No read-write concurrency. Reading the same partition while another job wrote to it would expose partial files or inconsistent ListBucket results.

2.2 The cloud + object storage shift was the breaking point

S3, ADLS, and GCS differed from HDFS in two decisive ways.

List is expensive and weakly consistent. Hive's structure made tens of thousands of s3 list calls to identify files, and S3 list takes minutes once a prefix carries hundreds of thousands of keys.
Rename is effectively copy + delete. HDFS directory rename was a metadata operation; in S3 it is per-object copying. A job failure mid-run leaves a "half-written" state visible.

In short, Hive was designed on the premise that "the filesystem provides strong consistency and fast renames," and the cloud broke that premise.

2.3 New formats arrive — Iceberg, Delta Lake, Hudi

Three formats arrived nearly simultaneously in this era.

Format	Started by	Design starting point
Apache Iceberg	2017, Netflix	"You must be able to know which files belong to a table without listing directories."
Delta Lake	2017, Databricks	"Extract Spark's transaction log so we can layer ACID on top."
Apache Hudi	2016, Uber	"Incremental processing optimized for streaming/upsert workloads."

All three solve the problem with a metadata layer, but the structure of that metadata and the operating model differ — which in turn drives the differences in which workloads each format handles best.

2.4 How Iceberg's reception evolved

2018–2020: "Netflix's internal project." Interesting use cases, limited adoption.
2020–2022: Promoted to Apache Top-Level (2020). AWS Athena/EMR, Snowflake, Trino, and Flink added support in turn.
2023–2024: The REST Catalog spec v1 was agreed; Snowflake Polaris and Databricks Unity began handling Iceberg directly, cementing the role of "a shared format across engines."
2025–2026: Iceberg V3 (Variant type, Geospatial, default values, deletion vectors) lands, the spec grows richer, and the REST Catalog becomes the de facto catalog standard.

3. Iceberg Architecture in Depth

3.1 The three-layer model

To guarantee the same answer regardless of which engine reads the table, Iceberg separates table state into three layers.

Iceberg's three-layer model — the catalog holds the metadata.json pointer, the metadata layer owns the snapshot/manifest tree, and the data layer holds the actual Parquet/ORC files

The consequences of this separation are clear:

Reading = no directory listing. You always walk the metadata.json → manifest list → manifest → data file tree. You are free from list cost and consistency problems.
Writing = create a new metadata.json and atomically swap the pointer in the catalog. Old data files and the old metadata.json remain in place, so other readers don't break.
The data files themselves carry no notion of "what shape the table should be." Column names, types, partitions — all of it lives in the metadata layer. That's why schema and partition evolution happen without data rewrite.

3.2 What's actually inside metadata.json

Roughly the following structure (V2-based, simplified).

{
  "format-version": 2,
  "table-uuid": "5f8a...e9",
  "location": "s3://bucket/warehouse/db/events",
  "last-updated-ms": 1747804800000,
  "last-column-id": 12,
  "schemas": [
    {
      "schema-id": 0,
      "fields": [
        { "id": 1, "name": "event_id",   "required": true,  "type": "long" },
        { "id": 2, "name": "user_id",    "required": false, "type": "long" },
        { "id": 3, "name": "event_ts",   "required": true,  "type": "timestamptz" },
        { "id": 4, "name": "event_type", "required": true,  "type": "string" }
      ]
    }
  ],
  "current-schema-id": 0,
  "partition-specs": [
    {
      "spec-id": 0,
      "fields": [
        { "name": "event_day", "source-id": 3, "transform": "day", "field-id": 1000 }
      ]
    }
  ],
  "default-spec-id": 0,
  "sort-orders": [ ... ],
  "current-snapshot-id": 8123412345678901234,
  "snapshots": [
    {
      "snapshot-id": 8123412345678901234,
      "timestamp-ms": 1747804790000,
      "summary": {
        "operation": "append",
        "added-data-files": "3",
        "added-records": "1450000"
      },
      "manifest-list": "s3://.../snap-8123-1-abc.avro",
      "schema-id": 0
    }
  ],
  "refs": {
    "main":  { "snapshot-id": 8123..., "type": "branch" },
    "wap-2026-05-20": { "snapshot-id": 8123..., "type": "tag" }
  }
}

Key observations:

The id inside fields is the real identifier of a column. The name can change while the ID stays the same.
partition-specs is an array. That is, a table can carry different partition specs over its lifetime.
snapshots are cumulative. All of them are kept (until expired) for time travel.
refs are Git-like branch/tag references. Beyond main, user-defined branches and tags are treated as first-class.

3.3 The division of labor between manifest list and manifest

To prune files quickly at query time, data file information is summarized in two stages.

The roles of manifest list and manifest — the manifest list holds per-manifest summaries while each manifest carries statistics for individual data and delete files

Query pruning flow (WHERE event_day = '2026-05-20' AND user_id = 42)

Get the current metadata.json location from the catalog.
metadata.json → read the manifest list of the current snapshot.
For each row of the manifest list (= one manifest file), first-pass pruning using the summarized partition ranges. Skip manifests that don't cover event_day = '2026-05-20'.
Read only the surviving manifests. Use each data file's user_id lower/upper bound for second-pass pruning.
Open only the remaining data files.

What this gives you:

Even tables with hundreds of thousands of data files only need to examine tens to hundreds per query.
No directory listing required, so you are not dependent on S3 list cost or consistency.
Statistics (lower/upper/null) are condensed into the manifests, so you don't need to open every Parquet footer.

3.4 Hidden partitioning

One of the biggest pain points of the Hive era was "the partition expression and the query predicate must match exactly." Iceberg solves this by registering a transform on the metadata.

-- DDL: partition event_ts by day (data only stores event_ts)
CREATE TABLE events (
  event_id BIGINT,
  user_id  BIGINT,
  event_ts TIMESTAMP,
  event_type STRING
)
USING iceberg
PARTITIONED BY (days(event_ts));
 
-- Query: you don't need to know the partition column
SELECT count(*)
FROM events
WHERE event_ts >= TIMESTAMP '2026-05-20 00:00:00'
  AND event_ts <  TIMESTAMP '2026-05-21 00:00:00';

Only event_ts is stored in the data; the partition key event_day only exists in the metadata.
Even when the predicate is on event_ts, the engine understands the partition spec's transform (days(event_ts)) and performs partition pruning automatically.
The user does not have to write a condition like event_day = .... The "partition key" is hidden from the user.

Supported transforms: identity, bucket(N, col), truncate(W, col), year, month, day, hour, void.

3.5 Schema evolution

Because Iceberg assigns columns permanent IDs, the following changes happen without rewriting data:

Operation	Safety	Notes
Add column	Safe	Existing rows are NULL or default
Drop column	Safe	The ID disappears from metadata only; data files are untouched
Rename column	Safe	ID is preserved; metadata.json updates only the name
Reorder column	Safe	Only the field order in metadata changes
Type widening	Partially safe	int → long, float → double, decimal precision increases, etc.
Type narrowing	Not allowed	Lossy conversions like long → int are forbidden
nullable → required	Conditional	Must verify that every existing row is non-null

What makes this possible is field-ID-based mapping in Iceberg, not Parquet's position-based mapping. Column IDs are written alongside the data file, so even if the name changes in metadata, the consistent column is found.

3.6 Partition evolution

Iceberg allows partition specs to be added over time. Patterns like "start daily, switch to hourly once traffic grows" are operationally feasible.

-- Initially: daily partitions
ALTER TABLE events SET TBLPROPERTIES (...);
-- partition spec id = 0 : (day(event_ts))
 
-- Switch to hourly mid-flight
ALTER TABLE events
  REPLACE PARTITION FIELD event_ts WITH hours(event_ts);
-- partition spec id = 1 : (hour(event_ts))

Operational notes:

Historical data remains under the old partition spec, and only new writes use the new spec.
Two partition specs therefore coexist within one table. Queries work correctly against both, but pruning efficiency differs by spec.
If needed, rewrite_data_files can rewrite old data to the new spec (with data-movement cost).

3.7 V1 vs V2 vs V3 — spec evolution

Item	V1	V2	V3 (2025+)
Standardized	2018+	2021+	2025+
Row-level delete	Impossible (CoW only)	Position / Equality delete file	Deletion vectors (Puffin)
Sequence number	None	Yes (essential for consistency)	Retained
Column default value	No	No	Yes
Variant type	No	No	Yes (semi-structured)
Geospatial type	No	No	Yes
Row Lineage	No	No	Yes (CDC-friendly)

What V2 brought — row-level deletes:

In V1, deleting one row meant rewriting the whole file it lived in (Copy-on-Write, CoW).
V2 introduced two kinds of delete file that allow expressing the change as "old file + delete marker" (Merge-on-Read, MoR).
- Position delete: (file path, row position). Best suited to CDC and MERGE workloads.
- Equality delete: (column = value). Best for key-based deletes.
At read time the engine applies the deletes in memory. Without frequent compaction, reads slow down.

What V3 brings:

Deletion Vectors (Puffin format) — A more efficient representation of V2 position deletes. A Roaring bitmap reduces memory and disk usage.
Variant type — Stores semi-structured data like JSON with a consistent encoding; engines interpret it the same way.
Row Lineage — Assigns a stable ID per row, useful for CDC and ML reproducibility.

3.8 Copy-on-Write vs Merge-on-Read

After V2 the most important operational decision is whether to use CoW or MoR per table.

Aspect	Copy-on-Write (CoW)	Merge-on-Read (MoR)
UPDATE/DELETE behavior	Rewrite affected files	Keep old files + add delete files
Write cost	High (large file rewrite)	Low (only small delete files)
Read cost	Low (no delete to apply)	High (deletes applied in memory)
Compaction/sort state	Maintained immediately	Degrades over time, needs compaction
Best for	Analytics-heavy, infrequent corrections	CDC, GDPR corrections, frequent upserts

Operational recommendations:

Analytics-heavy tables (e.g., daily aggregates, star-schema facts) — CoW recommended.
CDC/MERGE patterns (e.g., real-time user-state tables) — MoR + periodic compaction recommended.

Set the mode explicitly with table properties:

ALTER TABLE events SET TBLPROPERTIES (
  'write.delete.mode'='merge-on-read',
  'write.update.mode'='merge-on-read',
  'write.merge.mode' ='merge-on-read'
);

3.9 Time travel, branches, tags

Iceberg's snapshot model naturally extends to Git-like data version control.

-- Time travel (specific timestamp)
SELECT * FROM events FOR SYSTEM_TIME AS OF '2026-05-20 09:00:00';
 
-- By snapshot ID directly
SELECT * FROM events VERSION AS OF 8123412345678901234;
 
-- Create a branch (Write-Audit-Publish pattern)
ALTER TABLE events CREATE BRANCH `wap-2026-05-20`;
-- Write changes onto the branch and verify
INSERT INTO events.`wap-2026-05-20` SELECT ...;
-- On success, fast-forward main
ALTER TABLE events FAST FORWARD `main` TO `wap-2026-05-20`;
 
-- Tag (permanent retention point)
ALTER TABLE events CREATE TAG `q1-2026-close`
  AS OF VERSION 8123412345678901234
  RETAIN 365 DAYS;

Patterns:

Write-Audit-Publish (WAP) — Write new data to a branch first, merge to main only after quality checks (dbt test, Great Expectations) pass. If validation fails, simply discard the branch — preventing the risk of "even briefly exposing bad data on main."
ML experiment isolation — Pin training snapshots with tags (e.g., model-v3-train). Six months later you can retrain on identical data.
Legal correction and audit — Tag the pre-correction state so auditors can see "before and after" the correction.

4. Iceberg from an Operations Perspective

4.1 The standard maintenance job set

Operating Iceberg means automating these four maintenance jobs.

Job	What it does	Recommended frequency
`rewrite_data_files`	Merges small files into large ones; applies a sort order	Daily–weekly
`rewrite_manifests`	Reorganizes manifests to restore pruning efficiency	Weekly–monthly
`expire_snapshots`	Removes old snapshots and files only those referenced	Daily
`remove_orphan_files`	Removes data/metadata files not referenced by any metadata	Weekly–monthly

What happens if you skip them:

Small file explosion — A streaming job committing every 10 seconds creates 8,640 files/day. After a month that's 260,000. Each query must read thousands of manifests.
Metadata explosion — Tens of thousands of accumulated snapshots make metadata.json grow into tens of MBs, and every write rewrites it in full — commits slow down.
Storage cost explosion — Without expiration, a table that updated one row a hundred million times ends up storing tens of times the source volume.

4.2 Compaction design

-- Spark SQL: basic compaction
CALL system.rewrite_data_files(
  table => 'db.events',
  options => map(
    'min-input-files',     '5',
    'target-file-size-bytes','536870912', -- 512 MiB
    'rewrite-all',         'false'
  )
);
 
-- Add a sort order
CALL system.rewrite_data_files(
  table => 'db.events',
  strategy => 'sort',
  sort_order => 'event_ts ASC, user_id ASC'
);

Design principles:

target-file-size between 256–1024 MiB. Too small and you pay list/open cost; too large and shuffle/memory pressure grows.
Sort by the columns most often used for pruning — maximizes lower/upper bound efficiency.
MoR tables also apply delete files in the same pass during compaction, so periodic compaction is the key to maintaining query performance.
Job-size control — Don't rewrite every file at once. Compact incrementally by partition or time range. Use the where option to scope.

4.3 Snapshot expiration and cleanup

-- Expire snapshots older than 7 days, keep at least 5
CALL system.expire_snapshots(
  table => 'db.events',
  older_than => TIMESTAMP '2026-05-13 00:00:00',
  retain_last => 5
);
 
-- Remove orphan files not referenced by any snapshot (older than 3 days)
CALL system.remove_orphan_files(
  table => 'db.events',
  older_than => TIMESTAMP '2026-05-17 00:00:00'
);

Operational recommendations:

Be conservative with older_than — Long-running jobs (e.g., 6-hour backfills) might still reference old snapshots.
remove_orphan_files requires care — A wrong invocation can delete files another job just wrote. Validate with dry_run => true first.
Define a time-travel SLA — A policy like "we will not restore data older than 30 days" makes expiration thresholds clear.

4.4 Catalog selection

Iceberg abstracts catalogs, but the catalog you actually pick drives the operating model.

Catalog	Best for	Limitations
Hive Metastore (HMS)	Gradual adoption on top of existing Hive assets	Weak consistency and permission model in multi-engine settings
AWS Glue	Single-cloud AWS with Athena/EMR/Redshift integration	Awkward to use outside AWS
REST Catalog	Multi-engine, multi-cloud on a standardized spec	Self-hosting / operations burden; need to pick a backend implementation
Project Nessie	Git-like data versioning (branch/merge)	Limited permissioning and SaaS options
Snowflake Polaris	Multi-engine sharing in Snowflake-centric environments	Some Snowflake coupling remains
Databricks Unity Catalog	Databricks-centric environments where UC handles Iceberg as first-class	Engines outside UC need separate REST adapters

Recommended pattern: For a new multi-engine environment, put a backend that implements the REST Catalog spec (Apache Polaris, Tabular OSS, Lakekeeper, Apache Gravitino, Unity Catalog OSS) in front, and let Spark, Trino, Flink, BigQuery, and Snowflake all access tables through it.

REST Catalog topology — Spark, Trino, Flink, Snowflake, and BigQuery all access the same tables on the same object storage via a single REST catalog

4.5 Write modes and distribution tuning

ALTER TABLE events SET TBLPROPERTIES (
  'write.distribution-mode'  = 'hash',          -- 'none' | 'hash' | 'range'
  'write.target-file-size-bytes' = '536870912', -- 512 MiB
  'write.parquet.compression-codec' = 'zstd',
  'write.parquet.row-group-size-bytes' = '134217728',
  'commit.retry.num-retries' = '8',
  'commit.retry.min-wait-ms' = '500'
);

Key parameters:

write.distribution-mode
- none — Input distribution preserved. Lots of small files appear.
- hash — Hash on the partition columns. Generally recommended. File count per partition becomes uniform.
- range — Sorted distribution. Useful for time-ordered log workloads.
commit.retry.* — Retry policy on optimistic concurrency control failures with many writers. Raise the values when conflicts are frequent.

4.6 Monitoring metrics

To know whether an Iceberg table is healthy, watch these metrics regularly.

Metric	Meaning	Red flag
Average file count per partition	Compaction effect	Over 100
Average file size	Compaction / write distribution	Under 32 MiB
Cumulative snapshot count	Expiration policy working	Over 1,000
metadata.json size	Metadata bloat signal	Above 8 MiB
Average manifest size and count	Pruning efficiency	Over 5,000 manifests
Average commit latency	Catalog / concurrency issue	p95 above 5 s
delete file / data file ratio	Need for MoR compaction	Above 5% → compact

Periodically extract these values from the catalog and metadata and put them on a dashboard — that's standard operations practice. Iceberg's system tables (db.events.files, db.events.snapshots, db.events.manifests) can be used directly.

-- File statistics
SELECT
  partition,
  count(*)              AS file_count,
  avg(file_size_in_bytes) AS avg_size,
  sum(file_size_in_bytes) AS total_size
FROM db.events.files
GROUP BY partition
ORDER BY file_count DESC;
 
-- Snapshot accumulation
SELECT count(*) FROM db.events.snapshots;

4.7 The shape of operational automation

A mature Iceberg operations team needs this set of automated jobs:

Daily compaction job — rewrite_data_files on yesterday's partitions (only partitions whose small-file count crosses the threshold).
Daily expiration job — Expire snapshots older than N days; always keep a fixed number.
Weekly manifest rewrite job — rewrite_manifests.
Monthly orphan cleanup job — remove_orphan_files (dry run → validate → execute).
Table-health report job — Extract the monitoring metrics above and emit dashboards and alerts.

All these jobs must be idempotent and safely retryable on failure. In large environments, it is standard practice to extract this automation into a dedicated "table management service" and operate it alongside the catalog.

5. Engine Compatibility

5.1 Core engine support (as of 2026)

Engine	Read	Write	DML	Time travel	Branch/Tag	V2 (MoR)	V3	REST Catalog
Apache Spark	✓	✓	✓	✓	✓	✓	In progress	✓
Trino	✓	✓	✓	✓	✓	✓	Partial	✓
Apache Flink	✓	✓ (streaming)	Partial	✓	Partial	✓	In progress	✓
Snowflake	✓	✓	✓	✓	Limited	✓	In progress	✓ (Polaris)
Databricks (Unity)	✓	✓	✓	✓	✓	✓	In progress	✓ (UC OSS)
BigQuery	✓	Partial	Partial	✓	Partial	✓	In progress	✓ (BigLake)
AWS Athena	✓	✓	✓	✓	Partial	✓	In progress	✓ (Glue)
ClickHouse	✓	Experimental	✗	✓	✗	Partial	✗	✓
DuckDB	✓	Experimental	✗	✓	✗	Partial	✗	✓
PyIceberg	✓	✓	Partial	✓	✓	✓	In progress	✓

The table is a generalized snapshot of typical support as of May 2026; check the latest release notes for each engine and Iceberg's compatibility tables at the time of adoption.

5.2 Recommended role per engine

Spark — The standard for backfill, large ETL, and table maintenance. The system.* procedures are the most complete.
Trino — Interactive analytics and the BI back end. Strong on short queries; MoR application is stable.
Flink — Streaming ingestion. Strong on exactly-once commits and V2 delete writes.
Snowflake / Databricks — Self-service and BI for in-house users. The shared-table pattern through the catalog.
BigQuery / Athena — Reporting and ad-hoc analysis. When you only want queries with no infrastructure to operate.
PyIceberg — Lightweight ETL, ML training pipelines, and validation in notebooks or locally.

5.3 What the REST Catalog means

Since the REST Catalog standard took hold in 2024–2025, "decoupling engine from catalog so they can be combined freely" became real.

Multiple engines access the same single source of truth (tables) through the REST Catalog

The implications are decisive:

Break engine lock-in. Moving from one engine to another does not require data migration.
Concentrate permissions, audits, and policies in one place. The catalog becomes the true control plane.
The cost of adopting a new engine drops. Every engine implementing the REST spec can immediately work against the same tables.

6. Comparison with Other Formats

6.1 Iceberg vs Delta Lake vs Hudi — the core differences

Item	Iceberg	Delta Lake	Hudi
Starting point	Multi-engine, metadata-centric	Spark/Databricks-centric, transaction log	Streaming upsert / incremental processing
Metadata model	Snapshot + manifest tree	Transaction log (JSON) + checkpoints	Timeline (.hoodie) + metadata
Catalog abstraction	First-class (REST spec)	Secondary (Unity is filling the gap)	External dependency
Hidden partitioning	✓	Limited (generated column)	Partial
Partition evolution	✓	Limited	Partial
Schema evolution	Safe (ID-based)	Safe (name-based)	Safe
Row-level delete	V2 delete files / V3 vectors	Deletion vectors	Soft delete + compaction
Time travel	✓	✓	✓
Branch / tag	✓ (Git-like)	Limited (time travel only)	Limited
Multi-engine maturity	Highest	Databricks-centric, improving externally	Spark/Flink-centric
Streaming workloads	Possible and improving	Possible	Most mature
Public standard spec	Open and agreed through v3	Spec public but Databricks-led	Open

6.2 Decision tree

Iceberg vs Delta Lake vs Hudi decision tree — pick a format based on five questions: single engine vs not, multi-engine, CDC share, long-term retention / experiment isolation needs, and Hive-asset migration

6.3 Delta-Iceberg interop options

Since 2024, both camps have pursued interop, producing the following options.

Delta UniForm — Generate Iceberg metadata alongside a Delta table so Iceberg readers can read the Delta table. One-way (Delta → Iceberg read).
Apache XTable (formerly OneTable) — Translates metadata between Iceberg, Delta, and Hudi. Data is shared; only metadata is expressed in each format.
Delta exposed via Iceberg REST — Unity Catalog OSS is moving toward exposing Delta tables through the Iceberg REST spec.

Interop options are convenient, but "native spec as-is" is always the most stable. Known limitations of interop modes (e.g., gaps in V2 delete support) must be reviewed carefully before adoption.

7. Adoption Strategy and Migration Patterns

7.1 Which workloads to adopt first

Priority recommendations:

Long-term retention / legal-correction data — Time travel and branch/tag value materialize immediately.
Core fact tables that need multi-engine sharing — Once standardized, the impact ripples across the entire in-house analytics infrastructure.
Marts for new domains — Lowest-risk way to accumulate operations experience.
Existing Hive core tables — The biggest payoff, but also the biggest migration burden. Approach only after building operations know-how on 1–3.

7.2 Hive → Iceberg migration options

Three standard patterns.

(a) `migrate` — in-place replacement

CALL system.migrate('hive_db.events');
-- Replace the Hive table's metadata with Iceberg metadata.
-- Data files stay put. The fastest option.

Pros: No data movement, completes in minutes.
Cons: The old Hive directory structure (partition-key encoding) remains, so you lose some of the hidden-partitioning benefit. Column IDs get assigned to old files; some engines pay extra cost on the first query.

(b) `snapshot` — shadow table

CALL system.snapshot('hive_db.events', 'iceberg_db.events_v2');
-- Leave the Hive table in place and create an Iceberg table that references the same data files.
-- You can write to both during a validation/comparison window.

Pros: Safe comparison and rollback during operations.
Cons: You must maintain both sets of metadata in parallel for a while.

(c) `CTAS` — full rewrite

CREATE TABLE iceberg_db.events
USING iceberg
PARTITIONED BY (days(event_ts))
TBLPROPERTIES ('write.distribution-mode'='hash')
AS SELECT * FROM hive_db.events;

Pros: Apply a fresh partition spec, sort order, compression codec, and file-size policy from the start. Cleanest state.
Cons: Data is rewritten. Petabyte scale costs time and money.

Recommendation: (c) for core tables intended for long-term operation, (b) when short-term comparison matters, (a) when fast adoption is the priority.

7.3 Delta → Iceberg

Options:

Use UniForm — Keep Delta as is and additionally generate Iceberg metadata. Cheapest when only reads are needed.
XTable for two-way metadata translation — Data is shared; metadata is exposed in both formats.
CTAS, full rewrite — Recommended when you are ready to operate natively on Iceberg.

Operational recommendation: Do not move critical workloads off of Delta immediately. Build 6–12 months of Iceberg operations experience on a new domain or a shadow table, then migrate in stages.

7.4 Phase-by-phase adoption checklist

Phase 0 — Pre-assessment (2–4 weeks)

Inventory in-house engines, catalogs, and storage
Pick three candidate workloads (per the priority criteria)
Decide on a catalog (REST / Glue / Unity / Polaris, etc.)
Plan the operational-automation jobs

Phase 1 — PoC (4–8 weeks)

Create an Iceberg version of one candidate table via snapshot or CTAS
Verify identical results with two engines (e.g., Spark + Trino)
Run compaction, expiration, and orphan cleanup; collect monitoring metrics
Apply the WAP pattern and at least one time-travel use case in production

Phase 2 — Operational automation (4–8 weeks)

Standardize and roll out the five automation jobs from §4.7 organization-wide
Stabilize the catalog, permission, and audit models
Agree on monitoring dashboards and alert thresholds
Document operational runbooks (including failure scenarios)

Phase 3 — Expansion (3–6 months)

Migrate core tables per the priority list
Connect the in-house data catalog, BI, and ML pipelines to the new catalog
Decide whether to adopt Iceberg V3 (assess volatility and maturity)

7.5 Common mistakes in migration

Putting off the catalog decision — The "let's move the data first" approach turns the catalog into an operations bottleneck. Decide on it first.
Deferring operational automation — Something that worked nicely in PoC stops six months later under metadata explosion. Build the automation alongside the PoC.
Picking CoW or MoR uniformly — Ignoring per-table workload characteristics and forcing one mode makes CDC tables slow or analytics tables lose their sort. Decide per table.
No time-travel SLA — Without a "how far back must we be able to restore" policy, expiration becomes overly conservative and storage cost climbs forever.
Compaction job blowing up shuffle — One job compacting too wide a range can stall the cluster. Slice the range and time finely and cap per-job resources explicitly.

8. Outlook as of 2026

8.1 Spec evolution

V3 going mainstream — Variant, Geospatial, and Deletion Vectors should reach GA across major engines by late 2026. The largest gains are in CDC and real-time analytics workloads.
The rise of Row Lineage — Stable row-level IDs directly serve CDC, feature stores, and reproducible ML training. Combined with data-governance and lineage tooling, this will birth new operational patterns.
Standardization of materialized views — Spec-level agreement on MVs and aggregation caches over Iceberg will likely reshape the cost structure of analytics workloads again.

8.2 Realignment in the catalog camp

Apache Polaris, Unity OSS, Lakekeeper, Apache Gravitino compete on a shared REST baseline. With a standardized spec, users see the same interface regardless of backend.
Commercial vs OSS balance — The next three years will hinge on the choice between "use the catalog as SaaS" and "self-host." The depth of permission, audit, and metadata management will drive cost.

8.3 Changes on the engine side

Integration with AI/ML workloads — Iceberg's branches/tags and time travel are used to reproduce training data and to synchronize model and data versions. From 2026 on, feature stores and MLOps tooling will more frequently handle Iceberg as first-class.
Broader native support from OLAP engines — Direct Iceberg-write support in ClickHouse, StarRocks, and DuckDB is maturing fast.

8.4 The control point of data governance shifts

The combination of Iceberg + REST Catalog moves the governance control point from "the engine" to "the catalog." Once data masking, row-level filters, and audit logs are decided at the catalog level, the same policy applies regardless of which engine the user comes through. This makes multi-engine compliance consistent — practically for the first time.

9. Conclusions and Recommendations

9.1 Key messages

Iceberg is not a mere table format; it is a metadata specification that brings database semantics on top of object storage.
Its value is only partially visible in a single-engine environment, but becomes decisive in multi-engine, long-term, correction, and experiment-isolation workloads.
As of 2026, with the combination of the REST Catalog standard + V3 spec, Iceberg is effectively the standard format for multi-engine Lakehouses.
However, adopting Iceberg without operational automation quickly makes metadata operations cancel out the adoption benefit. Compaction, expiration, orphan cleanup, and monitoring are as important as understanding the spec.

9.2 Adoption recommendations

Scenario	Recommendation
Building a new multi-engine data platform	Adopt Iceberg + REST Catalog as the standard from day one
Existing Databricks-only environment considering external engines	Delta UniForm or gradual Iceberg adoption
Migrating a Hive-based legacy	Phased adoption starting with priority tables; CTAS recommended
Real-time CDC / upsert-heavy	Compare Iceberg V2/V3 (MoR) against Hudi
ML / experiment reproducibility matters	Use Iceberg branches, tags, and time travel

9.3 How Data Dynamics can help

Data Dynamics, the author of this whitepaper, supports Iceberg adoption across the following areas.

Lakehouse architecture design and review — Recommendations on catalog, engine, and storage topology
Migration execution — Staged migration from Hive and Delta, from PoC through operational automation
Operational-automation standardization — Idempotent job design for compaction, expiration, and cleanup; monitoring dashboards
Catalog operations — Selecting and operating REST Catalog backends (Polaris, Unity OSS, Lakekeeper)
Engine integration — Consistency validation across Spark, Trino, Flink, Snowflake, Databricks, and BigQuery

If you need a pre-adoption assessment or a technical workshop, contact us and we'll put together an adoption roadmap tailored to your environment.

10. References

Apache Iceberg official spec — iceberg.apache.org/spec
Apache Iceberg documentation — iceberg.apache.org/docs
Iceberg REST Catalog spec — open-api/rest-catalog-open-api.yaml in the Apache Iceberg repository
Apache Polaris — polaris.apache.org
Apache XTable — xtable.apache.org
PyIceberg — py.iceberg.apache.org
Delta Lake spec — delta.io/protocol
Apache Hudi docs — hudi.apache.org
Netflix Tech Blog — Iceberg adoption case studies
AWS, Snowflake, and Databricks integration guides for Iceberg
Related on this site: The Complete Delta Lake Guide (Delta-Iceberg comparison)

This whitepaper was written based on information as of May 2026. Iceberg's spec and engine compatibility evolve rapidly; check the latest release notes and compatibility tables at the time of adoption.