trinocachingperformanceobject-storagedata-platform

Trino Caching Strategies — Filesystem Cache, Metadata Cache, and Their Limits

In Trino deployments on object storage, repeated I/O and metastore round trips are the main sources of latency. This post covers the filesystem (data) cache, the Hive metadata cache, and why Trino has no result cache — along with the alternatives.

Data DynamicsJune 5, 20267 min read

Trino does not own its data — it reads from object storage (S3/GCS/Azure) or external systems on every query. This architecture is flexible, but as the cost of repeatedly pulling the same data over the network and metastore round-trip latency accumulate, interactive query responsiveness suffers. Caching is the key lever for reducing both costs.

This post organizes Trino caching along two axes — the data (filesystem) cache and the metadata cache — and also answers the frequently asked question, "why doesn't Trino have a result cache?"

1. Where the Time Goes

Loading diagram…

Cost	Cause	Caching remedy
Metadata RTT	Metastore lookups on every query	Metadata cache
Repeated data I/O	Hot data read remotely every time	Filesystem (data) cache
Re-running identical queries	Results recomputed every time	(No result cache → alternatives needed)

2. Filesystem Cache — Caching Data on Worker-Local Disks

Trino's filesystem cache stores data file blocks read from object storage on the workers' local disks (SSDs). When the same data is read again, it comes from local disk instead of the remote store, cutting latency and network cost.

# etc/config.properties (or per catalog)
fs.cache.enabled=true
fs.cache.directories=/data1/trino-cache,/data2/trino-cache
fs.cache.max-sizes=200GB,200GB

Item	Description
What gets cached	Data file blocks from object storage
Storage location	Worker-local SSD (spreading across multiple paths recommended)
Effect	Eliminates remote I/O for repeatedly scanned hot data
Consistency	Iceberg's immutable data file model makes cache invalidation trivial

Why It Pairs So Well with Iceberg

Iceberg data files are immutable. Once written, a file never changes; modifications are expressed as new files plus a new snapshot. So you never have to worry about "has the cached file diverged from the original?" — same file path, same contents. This immutability makes the filesystem cache both safe and effective.

Node Affinity (Soft Affinity)

For the cache to pay off, the same file should be read by the same worker whenever possible. Otherwise the same data gets cached redundantly on every worker, killing efficiency. Trino uses soft affinity scheduling, which assigns splits to consistent workers based on the file, to maximize cache hit rates.

3. Metadata Cache — Cutting Metastore Round Trips

3.1 Hive Connector Metadata Cache

The Hive connector queries the Hive Metastore (HMS) for schemas, partitions, and statistics on every query. With many partitions or a distant HMS, this RTT adds up. The cache mitigates it.

# etc/catalog/hive.properties
hive.metastore-cache-ttl=10m
hive.metastore-cache-maximum-size=10000
hive.metastore-refresh-interval=1m

Parameter	Meaning
`hive.metastore-cache-ttl`	How long a cache entry stays valid
`hive.metastore-refresh-interval`	Background refresh interval
`hive.metastore-cache-maximum-size`	Maximum number of cache entries

Trade-off: during the TTL window, Trino may see stale information even after an external tool changes the metadata. To flush it on demand:

CALL system.flush_metadata_cache();

A common misconception among folks coming from Cloudera/Impala: this cache is a "human-managed cache" similar to Impala's catalogd cache. The Iceberg connector, by contrast, uses a snapshot model, so this kind of TTL cache is fundamentally unnecessary — a commit is visibility. That's why moving to Iceberg eliminates much of the metadata-cache headache altogether.

3.2 Iceberg's Metadata Model

Iceberg reads partition and file statistics from manifest files (in storage) rather than via HMS RPCs. With no HMS round trips, the metadata bottleneck is structurally smaller. For Iceberg tables, the filesystem cache (which also caches manifest files) is therefore often all you need.

4. Result Cache — Why Trino Doesn't Have One

"If I run the same query again, why can't it just return the result from a cache?" is a frequent question. Trino does not provide a query result cache out of the box. The reasons:

Trino is a federation engine. Source data in external systems can change at any time, so it is generally impossible to guarantee that a cached result is still valid.
Rather than taking on result consistency (stale read) risk at the engine level, the design delegates consistency guarantees to the data layer.

Instead, the following alternatives exist.

Alternative	Approach	Best for
Materialized View	Precompute frequently used aggregations into a table, refresh periodically	Recurring heavy aggregations
Pre-aggregated table (CTAS)	Build summary tables via daily batch	Dashboard backends
BI tool cache	Superset/Tableau's own result caches	Dashboard responsiveness
Filesystem cache	Reduce input data I/O rather than caching results	Repeated scans

Materialized View

CREATE MATERIALIZED VIEW iceberg.analytics.daily_active_users AS
SELECT date(event_time) AS d, count(DISTINCT user_id) AS dau
FROM iceberg.analytics.events
GROUP BY date(event_time);
 
-- Periodic refresh (from a scheduler)
REFRESH MATERIALIZED VIEW iceberg.analytics.daily_active_users;

The Iceberg connector's materialized views store the result as a real table and track staleness when the source is updated. For patterns like dashboards that "repeat the same heavy aggregation," they are a practical substitute for a result cache.

5. When Caching Doesn't Help — or Hurts

Situation	Why
Every query scans different data (little overlap)	Filesystem cache hit rate near zero, only wasting disk
Worker-local disks are slow/small	Cache I/O ends up worse than remote reads
External tools change metadata frequently	Metadata TTL cache causes stale reads
One-off large batch jobs	No reuse, so caching is pointless
Faking a result cache on frequently changing tables	Consistency risk

Caching shines when there is "hot data/metadata that is accessed repeatedly." If your workload sweeps fresh data every time, a cache only adds cost.

6. Configuration Guide — Recommendations by Workload

Workload	Filesystem cache	Metadata cache	Result alternative
Interactive BI (repeated hot data)	On (large local SSD)	Unnecessary for Iceberg / short TTL for Hive	Materialized View, BI cache
Dashboard backend	On	—	Pre-aggregated tables
Large ETL (one-off scans)	Off	—	Not needed
Hive + many partitions	On	TTL + refresh + flush procedure	—

7. Summary

Cache type	Cost it reduces	Key settings	Watch out for
Filesystem (data)	Repeated remote I/O	`fs.cache.*` + soft affinity	Needs local SSD; only worth it for hot data
Metadata (Hive)	Metastore RTT	`hive.metastore-cache-ttl`	Stale reads, `flush_metadata_cache()`
Result cache	Recomputation	(none)	Replace with Materialized Views / pre-aggregation

Two principles govern Trino caching. First, cut data I/O with the filesystem cache and metadata RTT with the metadata cache — but only enable them for workloads with repeated access. Second, accept that there is no result cache, and replace recurring heavy aggregations with materialized views or pre-aggregated tables. Iceberg's immutable-file and snapshot model makes this caching strategy simpler and safer.

This article is based on Trino in the 440s. If you need help improving interactive query responsiveness or designing caching and pre-aggregation, feel free to reach out.

— Data Dynamics Engineering Team