trinoicebergmaintenancecompactionlakehousedata-platform

Iceberg Table Maintenance with Trino — OPTIMIZE, Snapshot Expiration, and Orphan File Removal

Iceberg tables get slower the longer they run in production. Small-file accumulation, snapshot explosion, and orphan files are the culprits. This post covers how to automate compaction, snapshot expiration, and orphan file cleanup with Trino's ALTER TABLE ... EXECUTE procedures, and how to diagnose table health with metadata tables.

Data DynamicsJune 5, 20267 min read

Iceberg tables are fast when freshly created, but they slowly degrade in production. As streaming ingestion or frequent INSERT/MERGE operations pile up, small files explode in number, every write creates a new snapshot, and expired data files linger in storage as orphans. Left unchecked, query planning stretches to tens of seconds and storage costs quietly leak away.

Fortunately, Trino provides procedures that run these maintenance tasks with a single line of SQL. This post walks through what makes a table slow and how to clean it up with OPTIMIZE / expire_snapshots / remove_orphan_files — from diagnosis to automation.

1. Three Reasons Iceberg Tables Slow Down

Loading diagram…

Cause	Symptom	Procedure
Small files	Little data, yet query planning is slow	`EXECUTE optimize`
Snapshot accumulation	Bloated metadata.json, planning delays	`EXECUTE expire_snapshots`
Orphan files	Storage usage larger than actual data	`EXECUTE remove_orphan_files`

2. Diagnose First — Metadata Tables

Trino exposes an Iceberg table's internal state through the "table$metatable" syntax. Assess the current state before cleaning up.

-- Data file count and total size (small-file diagnosis)
SELECT count(*) AS file_count,
       sum(file_size_in_bytes) / (1024*1024*1024) AS total_gb,
       avg(file_size_in_bytes) / (1024*1024) AS avg_file_mb
FROM iceberg.analytics."events$files";
 
-- Snapshot count and history (snapshot accumulation diagnosis)
SELECT count(*) AS snapshot_count,
       min(committed_at) AS oldest,
       max(committed_at) AS newest
FROM iceberg.analytics."events$snapshots";
 
-- File distribution per partition (are small files concentrated in specific partitions?)
SELECT partition, file_count, total_size
FROM iceberg.analytics."events$partitions"
ORDER BY file_count DESC
LIMIT 20;

Key metadata tables:

Table	Contents
`$files`	Data file list, sizes, statistics
`$snapshots`	Snapshot history and commit timestamps
`$partitions`	File count, row count, and size per partition
`$manifests`	Manifest file list
`$history`	Table state change history

Rule of thumb: if avg_file_mb is down in the single-digit MB range and file_count is in the tens or hundreds of thousands, you have a small-file problem; if snapshot_count is in the hundreds to thousands, you have snapshot accumulation.

3. OPTIMIZE — Small File Compaction

OPTIMIZE merges small files into larger files of an appropriate size (default target around 512MB).

-- Compact the entire table
ALTER TABLE iceberg.analytics.events EXECUTE optimize;
 
-- Specific partitions only (recommended: just the recently ingested range)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
  WHERE event_time >= TIMESTAMP '2026-06-01 00:00:00 UTC'
    AND event_time <  TIMESTAMP '2026-06-05 00:00:00 UTC';
 
-- Only target files below a given size (file_size_threshold)
ALTER TABLE iceberg.analytics.events EXECUTE optimize(file_size_threshold => '128MB');

Key points:

Narrow the scope with WHERE. Full-table compaction is expensive and slow. For streaming ingestion, compacting only the "yesterday-to-today" partitions every day is far more efficient.
Targeting only files smaller than file_size_threshold leaves already-large files untouched, cutting cost.
Compaction creates a new snapshot. The old small files are not deleted immediately — they become orphan candidates — so the standard practice is to follow compaction with snapshot expiration and orphan file removal.

Compaction with Sorting

If the table has sorted_by configured, OPTIMIZE rewrites the data in sorted order. Sorting amplifies the effect of Parquet min/max statistics and improves pruning.

ALTER TABLE iceberg.analytics.events SET PROPERTIES sorted_by = ARRAY['user_id'];
ALTER TABLE iceberg.analytics.events EXECUTE optimize;

4. expire_snapshots — Snapshot Expiration

Iceberg records a snapshot on every write (for time travel and rollback). As they accumulate, metadata files grow, and old data files cannot be deleted until their snapshots expire.

-- Expire snapshots older than 7 days
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');

Behavior:

Removes snapshots older than retention_threshold and physically deletes data files that were referenced only by those snapshots.
In other words, expire_snapshots is the step that actually reclaims the old small files left behind by OPTIMIZE.
Setting it too short eliminates your time travel and rollback headroom, so align it with your operational policy (typically 3–14 days).

-- Time travel: past versions remain queryable until expired
SELECT * FROM iceberg.analytics.events
FOR TIMESTAMP AS OF TIMESTAMP '2026-06-03 00:00:00 UTC';

Caution: there is usually a safety guard preventing retention_threshold from being lowered below the cluster's default minimum (e.g. 7 days). Forcing a shorter value requires changing table/catalog settings, which is not recommended from a data-protection standpoint.

5. remove_orphan_files — Orphan File Removal

This deletes files that exist in storage but are not referenced by any snapshot (orphans). Causes include abnormally terminated writes, failed commits, and leftovers from external tools.

ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');

Only orphan files older than retention_threshold are removed. If this value is too short, files from a commit currently in flight (not yet reflected in metadata) can be misidentified as orphans and deleted, so always keep it generous (default 7 days recommended).
Don't run it too aggressively while concurrent writes are in progress. It's safest to execute it during quiet write periods.

6. Recommended Maintenance Order

The order of these three tasks matters.

Loading diagram…

-- Daily maintenance bundle (compact recent partitions + cleanup)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
  WHERE event_time >= current_timestamp - INTERVAL '2' DAY;
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');
ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');

7. Automation — What and How Often

Recommended cadence by workload:

Task	Streaming / frequent ingestion	Daily batch	Mostly static
OPTIMIZE	Daily (recent partitions)	Daily/weekly	Monthly
expire_snapshots	Daily	Weekly	Monthly
remove_orphan_files	Weekly	Weekly	Monthly

How to automate:

Run the SQL bundle above per table from a scheduler (Airflow/cron).
Assign compaction to a dedicated maintenance Resource Group (low concurrency) so it doesn't eat into resources for analytical queries.
For large tables, don't run the whole thing at once — split the partition range and proceed incrementally.

8. Common Pitfalls

Pitfall	Consequence	Avoidance
Running OPTIMIZE without expiring snapshots	Old small files never deleted, storage stays bloated	Always pair with expire_snapshots
retention_threshold too short	In-flight files wrongly deleted / time travel lost	7 days or more recommended
Full-table OPTIMIZE every day	Cost and runtime explosion	Use WHERE to target recent partitions only
Maintenance during write peaks	Commit conflicts, resource contention	Off-peak hours + dedicated Resource Group
Aggressive remove_orphan_files on the analytics cluster	I/O pressure	Weekly, during low-load windows

9. Summary

Procedure	Solves	Key options
`EXECUTE optimize`	Small file merging (+ sorting)	`WHERE`, `file_size_threshold`
`EXECUTE expire_snapshots`	Snapshot and old file cleanup	`retention_threshold`
`EXECUTE remove_orphan_files`	Reclaiming unreferenced leftover files	`retention_threshold`

The essence of Iceberg table maintenance is running these three tasks in order, on a schedule, with a narrowed scope. Diagnose the current state first with the $files and $snapshots metadata tables, then automate daily recent-partition compaction for streaming tables and periodic snapshot/orphan cleanup. With just this routine in place, you can structurally prevent Iceberg's chronic "slower over time" disease.

This article was written against Trino 440-series and Iceberg spec v2. If you need help automating lakehouse table maintenance or diagnosing performance degradation, feel free to reach out.

— Data Dynamics Engineering Team