Blog
trinoicebergmaintenancecompactionlakehousedata-platform

Iceberg Table Maintenance with Trino — OPTIMIZE, Snapshot Expiration, and Orphan File Removal

Iceberg tables get slower the longer they run in production. Small-file accumulation, snapshot explosion, and orphan files are the culprits. This post covers how to automate compaction, snapshot expiration, and orphan file cleanup with Trino's ALTER TABLE ... EXECUTE procedures, and how to diagnose table health with metadata tables.

Data DynamicsJune 5, 20267 min read

Iceberg tables are fast when freshly created, but they slowly degrade in production. As streaming ingestion or frequent INSERT/MERGE operations pile up, small files explode in number, every write creates a new snapshot, and expired data files linger in storage as orphans. Left unchecked, query planning stretches to tens of seconds and storage costs quietly leak away.

Fortunately, Trino provides procedures that run these maintenance tasks with a single line of SQL. This post walks through what makes a table slow and how to clean it up with OPTIMIZE / expire_snapshots / remove_orphan_files — from diagnosis to automation.

1. Three Reasons Iceberg Tables Slow Down

①  Small files            → per-file metadata/footer I/O, planning blowup
②  Snapshot accumulation  → bloated metadata files, excessive time travel history
③  Orphan files           → data no longer referenced still sitting in storage
CauseSymptomProcedure
Small filesLittle data, yet query planning is slowEXECUTE optimize
Snapshot accumulationBloated metadata.json, planning delaysEXECUTE expire_snapshots
Orphan filesStorage usage larger than actual dataEXECUTE remove_orphan_files

2. Diagnose First — Metadata Tables

Trino exposes an Iceberg table's internal state through the "table$metatable" syntax. Assess the current state before cleaning up.

-- Data file count and total size (small-file diagnosis)
SELECT count(*) AS file_count,
       sum(file_size_in_bytes) / (1024*1024*1024) AS total_gb,
       avg(file_size_in_bytes) / (1024*1024) AS avg_file_mb
FROM iceberg.analytics."events$files";
 
-- Snapshot count and history (snapshot accumulation diagnosis)
SELECT count(*) AS snapshot_count,
       min(committed_at) AS oldest,
       max(committed_at) AS newest
FROM iceberg.analytics."events$snapshots";
 
-- File distribution per partition (are small files concentrated in specific partitions?)
SELECT partition, file_count, total_size
FROM iceberg.analytics."events$partitions"
ORDER BY file_count DESC
LIMIT 20;

Key metadata tables:

TableContents
$filesData file list, sizes, statistics
$snapshotsSnapshot history and commit timestamps
$partitionsFile count, row count, and size per partition
$manifestsManifest file list
$historyTable state change history

Rule of thumb: if avg_file_mb is down in the single-digit MB range and file_count is in the tens or hundreds of thousands, you have a small-file problem; if snapshot_count is in the hundreds to thousands, you have snapshot accumulation.

3. OPTIMIZE — Small File Compaction

OPTIMIZE merges small files into larger files of an appropriate size (default target around 512MB).

-- Compact the entire table
ALTER TABLE iceberg.analytics.events EXECUTE optimize;
 
-- Specific partitions only (recommended: just the recently ingested range)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
  WHERE event_time >= TIMESTAMP '2026-06-01 00:00:00 UTC'
    AND event_time <  TIMESTAMP '2026-06-05 00:00:00 UTC';
 
-- Only target files below a given size (file_size_threshold)
ALTER TABLE iceberg.analytics.events EXECUTE optimize(file_size_threshold => '128MB');

Key points:

  • Narrow the scope with WHERE. Full-table compaction is expensive and slow. For streaming ingestion, compacting only the "yesterday-to-today" partitions every day is far more efficient.
  • Targeting only files smaller than file_size_threshold leaves already-large files untouched, cutting cost.
  • Compaction creates a new snapshot. The old small files are not deleted immediately — they become orphan candidates — so the standard practice is to follow compaction with snapshot expiration and orphan file removal.

Compaction with Sorting

If the table has sorted_by configured, OPTIMIZE rewrites the data in sorted order. Sorting amplifies the effect of Parquet min/max statistics and improves pruning.

ALTER TABLE iceberg.analytics.events SET PROPERTIES sorted_by = ARRAY['user_id'];
ALTER TABLE iceberg.analytics.events EXECUTE optimize;

4. expire_snapshots — Snapshot Expiration

Iceberg records a snapshot on every write (for time travel and rollback). As they accumulate, metadata files grow, and old data files cannot be deleted until their snapshots expire.

-- Expire snapshots older than 7 days
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');

Behavior:

  • Removes snapshots older than retention_threshold and physically deletes data files that were referenced only by those snapshots.
  • In other words, expire_snapshots is the step that actually reclaims the old small files left behind by OPTIMIZE.
  • Setting it too short eliminates your time travel and rollback headroom, so align it with your operational policy (typically 3–14 days).
-- Time travel: past versions remain queryable until expired
SELECT * FROM iceberg.analytics.events
FOR TIMESTAMP AS OF TIMESTAMP '2026-06-03 00:00:00 UTC';

Caution: there is usually a safety guard preventing retention_threshold from being lowered below the cluster's default minimum (e.g. 7 days). Forcing a shorter value requires changing table/catalog settings, which is not recommended from a data-protection standpoint.

5. remove_orphan_files — Orphan File Removal

This deletes files that exist in storage but are not referenced by any snapshot (orphans). Causes include abnormally terminated writes, failed commits, and leftovers from external tools.

ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');
  • Only orphan files older than retention_threshold are removed. If this value is too short, files from a commit currently in flight (not yet reflected in metadata) can be misidentified as orphans and deleted, so always keep it generous (default 7 days recommended).
  • Don't run it too aggressively while concurrent writes are in progress. It's safest to execute it during quiet write periods.

The order of these three tasks matters.

① OPTIMIZE              small files → large files (new snapshot created, old files become orphan candidates)

② expire_snapshots      remove old snapshots → physically delete data files no longer referenced

③ remove_orphan_files   reclaim leftover files even the metadata doesn't know about
-- Daily maintenance bundle (compact recent partitions + cleanup)
ALTER TABLE iceberg.analytics.events EXECUTE optimize
  WHERE event_time >= current_timestamp - INTERVAL '2' DAY;
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');
ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');

7. Automation — What and How Often

Recommended cadence by workload:

TaskStreaming / frequent ingestionDaily batchMostly static
OPTIMIZEDaily (recent partitions)Daily/weeklyMonthly
expire_snapshotsDailyWeeklyMonthly
remove_orphan_filesWeeklyWeeklyMonthly

How to automate:

  • Run the SQL bundle above per table from a scheduler (Airflow/cron).
  • Assign compaction to a dedicated maintenance Resource Group (low concurrency) so it doesn't eat into resources for analytical queries.
  • For large tables, don't run the whole thing at once — split the partition range and proceed incrementally.

8. Common Pitfalls

PitfallConsequenceAvoidance
Running OPTIMIZE without expiring snapshotsOld small files never deleted, storage stays bloatedAlways pair with expire_snapshots
retention_threshold too shortIn-flight files wrongly deleted / time travel lost7 days or more recommended
Full-table OPTIMIZE every dayCost and runtime explosionUse WHERE to target recent partitions only
Maintenance during write peaksCommit conflicts, resource contentionOff-peak hours + dedicated Resource Group
Aggressive remove_orphan_files on the analytics clusterI/O pressureWeekly, during low-load windows

9. Summary

ProcedureSolvesKey options
EXECUTE optimizeSmall file merging (+ sorting)WHERE, file_size_threshold
EXECUTE expire_snapshotsSnapshot and old file cleanupretention_threshold
EXECUTE remove_orphan_filesReclaiming unreferenced leftover filesretention_threshold

The essence of Iceberg table maintenance is running these three tasks in order, on a schedule, with a narrowed scope. Diagnose the current state first with the $files and $snapshots metadata tables, then automate daily recent-partition compaction for streaming tables and periodic snapshot/orphan cleanup. With just this routine in place, you can structurally prevent Iceberg's chronic "slower over time" disease.


This article was written against Trino 440-series and Iceberg spec v2. If you need help automating lakehouse table maintenance or diagnosing performance degradation, feel free to reach out.

— Data Dynamics Engineering Team