Trino Caching Strategies — Filesystem Cache, Metadata Cache, and Their Limits
In Trino deployments on object storage, repeated I/O and metastore round trips are the main sources of latency. This post covers the filesystem (data) cache, the Hive metadata cache, and why Trino has no result cache — along with the alternatives.
Trino does not own its data — it reads from object storage (S3/GCS/Azure) or external systems on every query. This architecture is flexible, but as the cost of repeatedly pulling the same data over the network and metastore round-trip latency accumulate, interactive query responsiveness suffers. Caching is the key lever for reducing both costs.
This post organizes Trino caching along two axes — the data (filesystem) cache and the metadata cache — and also answers the frequently asked question, "why doesn't Trino have a result cache?"
1. Where the Time Goes
[One query]
Metadata lookup ──> metastore/catalog round trips (RTT)
│
File list & stats ──> read manifest / HMS
│
Data scan ──> read Parquet/ORC from object storage (network bandwidth)| Cost | Cause | Caching remedy |
|---|---|---|
| Metadata RTT | Metastore lookups on every query | Metadata cache |
| Repeated data I/O | Hot data read remotely every time | Filesystem (data) cache |
| Re-running identical queries | Results recomputed every time | (No result cache → alternatives needed) |
2. Filesystem Cache — Caching Data on Worker-Local Disks
Trino's filesystem cache stores data file blocks read from object storage on the workers' local disks (SSDs). When the same data is read again, it comes from local disk instead of the remote store, cutting latency and network cost.
# etc/config.properties (or per catalog)
fs.cache.enabled=true
fs.cache.directories=/data1/trino-cache,/data2/trino-cache
fs.cache.max-sizes=200GB,200GB| Item | Description |
|---|---|
| What gets cached | Data file blocks from object storage |
| Storage location | Worker-local SSD (spreading across multiple paths recommended) |
| Effect | Eliminates remote I/O for repeatedly scanned hot data |
| Consistency | Iceberg's immutable data file model makes cache invalidation trivial |
Why It Pairs So Well with Iceberg
Iceberg data files are immutable. Once written, a file never changes; modifications are expressed as new files plus a new snapshot. So you never have to worry about "has the cached file diverged from the original?" — same file path, same contents. This immutability makes the filesystem cache both safe and effective.
Node Affinity (Soft Affinity)
For the cache to pay off, the same file should be read by the same worker whenever possible. Otherwise the same data gets cached redundantly on every worker, killing efficiency. Trino uses soft affinity scheduling, which assigns splits to consistent workers based on the file, to maximize cache hit rates.
3. Metadata Cache — Cutting Metastore Round Trips
3.1 Hive Connector Metadata Cache
The Hive connector queries the Hive Metastore (HMS) for schemas, partitions, and statistics on every query. With many partitions or a distant HMS, this RTT adds up. The cache mitigates it.
# etc/catalog/hive.properties
hive.metastore-cache-ttl=10m
hive.metastore-cache-maximum-size=10000
hive.metastore-refresh-interval=1m| Parameter | Meaning |
|---|---|
hive.metastore-cache-ttl | How long a cache entry stays valid |
hive.metastore-refresh-interval | Background refresh interval |
hive.metastore-cache-maximum-size | Maximum number of cache entries |
Trade-off: during the TTL window, Trino may see stale information even after an external tool changes the metadata. To flush it on demand:
CALL system.flush_metadata_cache();A common misconception among folks coming from Cloudera/Impala: this cache is a "human-managed cache" similar to Impala's
catalogdcache. The Iceberg connector, by contrast, uses a snapshot model, so this kind of TTL cache is fundamentally unnecessary — a commit is visibility. That's why moving to Iceberg eliminates much of the metadata-cache headache altogether.
3.2 Iceberg's Metadata Model
Iceberg reads partition and file statistics from manifest files (in storage) rather than via HMS RPCs. With no HMS round trips, the metadata bottleneck is structurally smaller. For Iceberg tables, the filesystem cache (which also caches manifest files) is therefore often all you need.
4. Result Cache — Why Trino Doesn't Have One
"If I run the same query again, why can't it just return the result from a cache?" is a frequent question. Trino does not provide a query result cache out of the box. The reasons:
- Trino is a federation engine. Source data in external systems can change at any time, so it is generally impossible to guarantee that a cached result is still valid.
- Rather than taking on result consistency (stale read) risk at the engine level, the design delegates consistency guarantees to the data layer.
Instead, the following alternatives exist.
| Alternative | Approach | Best for |
|---|---|---|
| Materialized View | Precompute frequently used aggregations into a table, refresh periodically | Recurring heavy aggregations |
| Pre-aggregated table (CTAS) | Build summary tables via daily batch | Dashboard backends |
| BI tool cache | Superset/Tableau's own result caches | Dashboard responsiveness |
| Filesystem cache | Reduce input data I/O rather than caching results | Repeated scans |
Materialized View
CREATE MATERIALIZED VIEW iceberg.analytics.daily_active_users AS
SELECT date(event_time) AS d, count(DISTINCT user_id) AS dau
FROM iceberg.analytics.events
GROUP BY date(event_time);
-- Periodic refresh (from a scheduler)
REFRESH MATERIALIZED VIEW iceberg.analytics.daily_active_users;The Iceberg connector's materialized views store the result as a real table and track staleness when the source is updated. For patterns like dashboards that "repeat the same heavy aggregation," they are a practical substitute for a result cache.
5. When Caching Doesn't Help — or Hurts
| Situation | Why |
|---|---|
| Every query scans different data (little overlap) | Filesystem cache hit rate near zero, only wasting disk |
| Worker-local disks are slow/small | Cache I/O ends up worse than remote reads |
| External tools change metadata frequently | Metadata TTL cache causes stale reads |
| One-off large batch jobs | No reuse, so caching is pointless |
| Faking a result cache on frequently changing tables | Consistency risk |
Caching shines when there is "hot data/metadata that is accessed repeatedly." If your workload sweeps fresh data every time, a cache only adds cost.
6. Configuration Guide — Recommendations by Workload
| Workload | Filesystem cache | Metadata cache | Result alternative |
|---|---|---|---|
| Interactive BI (repeated hot data) | On (large local SSD) | Unnecessary for Iceberg / short TTL for Hive | Materialized View, BI cache |
| Dashboard backend | On | — | Pre-aggregated tables |
| Large ETL (one-off scans) | Off | — | Not needed |
| Hive + many partitions | On | TTL + refresh + flush procedure | — |
7. Summary
| Cache type | Cost it reduces | Key settings | Watch out for |
|---|---|---|---|
| Filesystem (data) | Repeated remote I/O | fs.cache.* + soft affinity | Needs local SSD; only worth it for hot data |
| Metadata (Hive) | Metastore RTT | hive.metastore-cache-ttl | Stale reads, flush_metadata_cache() |
| Result cache | Recomputation | (none) | Replace with Materialized Views / pre-aggregation |
Two principles govern Trino caching. First, cut data I/O with the filesystem cache and metadata RTT with the metadata cache — but only enable them for workloads with repeated access. Second, accept that there is no result cache, and replace recurring heavy aggregations with materialized views or pre-aggregated tables. Iceberg's immutable-file and snapshot model makes this caching strategy simpler and safer.
This article is based on Trino in the 440s. If you need help improving interactive query responsiveness or designing caching and pre-aggregation, feel free to reach out.
— Data Dynamics Engineering Team