Iceberg Table Maintenance with Trino — OPTIMIZE, Snapshot Expiration, and Orphan File Removal
Iceberg tables get slower the longer they run in production. Small-file accumulation, snapshot explosion, and orphan files are the culprits. This post covers how to automate compaction, snapshot expiration, and orphan file cleanup with Trino's ALTER TABLE ... EXECUTE procedures, and how to diagnose table health with metadata tables.
Iceberg tables are fast when freshly created, but they slowly degrade in production. As streaming ingestion or frequent INSERT/MERGE operations pile up, small files explode in number, every write creates a new snapshot, and expired data files linger in storage as orphans. Left unchecked, query planning stretches to tens of seconds and storage costs quietly leak away.
Fortunately, Trino provides procedures that run these maintenance tasks with a single line of SQL. This post walks through what makes a table slow and how to clean it up with OPTIMIZE / expire_snapshots / remove_orphan_files — from diagnosis to automation.
1. Three Reasons Iceberg Tables Slow Down
① Small files → per-file metadata/footer I/O, planning blowup
② Snapshot accumulation → bloated metadata files, excessive time travel history
③ Orphan files → data no longer referenced still sitting in storage| Cause | Symptom | Procedure |
|---|---|---|
| Small files | Little data, yet query planning is slow | EXECUTE optimize |
| Snapshot accumulation | Bloated metadata.json, planning delays | EXECUTE expire_snapshots |
| Orphan files | Storage usage larger than actual data | EXECUTE remove_orphan_files |
2. Diagnose First — Metadata Tables
Trino exposes an Iceberg table's internal state through the "table$metatable" syntax. Assess the current state before cleaning up.
-- Data file count and total size (small-file diagnosis)
SELECT count(*) AS file_count,
sum(file_size_in_bytes) / (1024*1024*1024) AS total_gb,
avg(file_size_in_bytes) / (1024*1024) AS avg_file_mb
FROM iceberg.analytics."events$files";
-- Snapshot count and history (snapshot accumulation diagnosis)
SELECT count(*) AS snapshot_count,
min(committed_at) AS oldest,
max(committed_at) AS newest
FROM iceberg.analytics."events$snapshots";
-- File distribution per partition (are small files concentrated in specific partitions?)
SELECT partition, file_count, total_size
FROM iceberg.analytics."events$partitions"
ORDER BY file_count DESC
LIMIT 20;Key metadata tables:
| Table | Contents |
|---|---|
$files | Data file list, sizes, statistics |
$snapshots | Snapshot history and commit timestamps |
$partitions | File count, row count, and size per partition |
$manifests | Manifest file list |
$history | Table state change history |
Rule of thumb: if
avg_file_mbis down in the single-digit MB range andfile_countis in the tens or hundreds of thousands, you have a small-file problem; ifsnapshot_countis in the hundreds to thousands, you have snapshot accumulation.
3. OPTIMIZE — Small File Compaction
OPTIMIZE merges small files into larger files of an appropriate size (default target around 512MB).
-- Compact the entire table
ALTER TABLE iceberg.analytics.events EXECUTE optimize;
-- Specific partitions only (recommended: just the recently ingested range)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
WHERE event_time >= TIMESTAMP '2026-06-01 00:00:00 UTC'
AND event_time < TIMESTAMP '2026-06-05 00:00:00 UTC';
-- Only target files below a given size (file_size_threshold)
ALTER TABLE iceberg.analytics.events EXECUTE optimize(file_size_threshold => '128MB');Key points:
- Narrow the scope with WHERE. Full-table compaction is expensive and slow. For streaming ingestion, compacting only the "yesterday-to-today" partitions every day is far more efficient.
- Targeting only files smaller than
file_size_thresholdleaves already-large files untouched, cutting cost. - Compaction creates a new snapshot. The old small files are not deleted immediately — they become orphan candidates — so the standard practice is to follow compaction with snapshot expiration and orphan file removal.
Compaction with Sorting
If the table has sorted_by configured, OPTIMIZE rewrites the data in sorted order. Sorting amplifies the effect of Parquet min/max statistics and improves pruning.
ALTER TABLE iceberg.analytics.events SET PROPERTIES sorted_by = ARRAY['user_id'];
ALTER TABLE iceberg.analytics.events EXECUTE optimize;4. expire_snapshots — Snapshot Expiration
Iceberg records a snapshot on every write (for time travel and rollback). As they accumulate, metadata files grow, and old data files cannot be deleted until their snapshots expire.
-- Expire snapshots older than 7 days
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');Behavior:
- Removes snapshots older than
retention_thresholdand physically deletes data files that were referenced only by those snapshots. - In other words, expire_snapshots is the step that actually reclaims the old small files left behind by OPTIMIZE.
- Setting it too short eliminates your time travel and rollback headroom, so align it with your operational policy (typically 3–14 days).
-- Time travel: past versions remain queryable until expired
SELECT * FROM iceberg.analytics.events
FOR TIMESTAMP AS OF TIMESTAMP '2026-06-03 00:00:00 UTC';Caution: there is usually a safety guard preventing
retention_thresholdfrom being lowered below the cluster's default minimum (e.g. 7 days). Forcing a shorter value requires changing table/catalog settings, which is not recommended from a data-protection standpoint.
5. remove_orphan_files — Orphan File Removal
This deletes files that exist in storage but are not referenced by any snapshot (orphans). Causes include abnormally terminated writes, failed commits, and leftovers from external tools.
ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');- Only orphan files older than
retention_thresholdare removed. If this value is too short, files from a commit currently in flight (not yet reflected in metadata) can be misidentified as orphans and deleted, so always keep it generous (default 7 days recommended). - Don't run it too aggressively while concurrent writes are in progress. It's safest to execute it during quiet write periods.
6. Recommended Maintenance Order
The order of these three tasks matters.
① OPTIMIZE small files → large files (new snapshot created, old files become orphan candidates)
↓
② expire_snapshots remove old snapshots → physically delete data files no longer referenced
↓
③ remove_orphan_files reclaim leftover files even the metadata doesn't know about-- Daily maintenance bundle (compact recent partitions + cleanup)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
WHERE event_time >= current_timestamp - INTERVAL '2' DAY;
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');
ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');7. Automation — What and How Often
Recommended cadence by workload:
| Task | Streaming / frequent ingestion | Daily batch | Mostly static |
|---|---|---|---|
| OPTIMIZE | Daily (recent partitions) | Daily/weekly | Monthly |
| expire_snapshots | Daily | Weekly | Monthly |
| remove_orphan_files | Weekly | Weekly | Monthly |
How to automate:
- Run the SQL bundle above per table from a scheduler (Airflow/cron).
- Assign compaction to a dedicated maintenance Resource Group (low concurrency) so it doesn't eat into resources for analytical queries.
- For large tables, don't run the whole thing at once — split the partition range and proceed incrementally.
8. Common Pitfalls
| Pitfall | Consequence | Avoidance |
|---|---|---|
| Running OPTIMIZE without expiring snapshots | Old small files never deleted, storage stays bloated | Always pair with expire_snapshots |
| retention_threshold too short | In-flight files wrongly deleted / time travel lost | 7 days or more recommended |
| Full-table OPTIMIZE every day | Cost and runtime explosion | Use WHERE to target recent partitions only |
| Maintenance during write peaks | Commit conflicts, resource contention | Off-peak hours + dedicated Resource Group |
| Aggressive remove_orphan_files on the analytics cluster | I/O pressure | Weekly, during low-load windows |
9. Summary
| Procedure | Solves | Key options |
|---|---|---|
EXECUTE optimize | Small file merging (+ sorting) | WHERE, file_size_threshold |
EXECUTE expire_snapshots | Snapshot and old file cleanup | retention_threshold |
EXECUTE remove_orphan_files | Reclaiming unreferenced leftover files | retention_threshold |
The essence of Iceberg table maintenance is running these three tasks in order, on a schedule, with a narrowed scope. Diagnose the current state first with the $files and $snapshots metadata tables, then automate daily recent-partition compaction for streaming tables and periodic snapshot/orphan cleanup. With just this routine in place, you can structurally prevent Iceberg's chronic "slower over time" disease.
This article was written against Trino 440-series and Iceberg spec v2. If you need help automating lakehouse table maintenance or diagnosing performance degradation, feel free to reach out.
— Data Dynamics Engineering Team